perm filename V2H.IN[TEX,DEK] blob sn#359279 filedate 1977-07-18 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00019 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	folio 344 galley 1
C00022 00003	folio 347 galley 2
C00040 00004	folio 350 galley 3 WARNING: Much of this tape unreadable!
C00063 00005	folio 354 galley 4
C00079 00006	folio 357 galley 5
C00098 00007	folio 360 galley 6
C00121 00008	folio 363 galley 7
C00138 00009	folio 366 galley 8
C00158 00010	folio 370 galley 9
C00177 00011	folio 372 galley 10
C00194 00012	folio 376 galley 11 WARNING: Some bad spots on this tape.
C00208 00013	folio 379 galley 12
C00224 00014	folio 382 galley 13
C00243 00015	folio 385 galley 14
C00255 00016	folio 388 galley 15
C00270 00017	folio 392 galley 16
C00291 00018	folio 394 galley 17
C00304 00019	folio 395 galley 18
C00314 ENDMK
C⊗;
folio 344 galley 1
    0  {U0}{H9L11M29}|πW58320#Computer Programming!(Knuth/Addison-W
    1  esley)!f.344!ch.4.!g.1b.|'{A20}{H10L12M29}|π!|9|4|1|1|1It 
    3  is a straightforward matter to apply the classical 
   11  algorithms for integers to problems involving 
   17  numbers with embedded radix points, or rational 
   24  numbers, or extended-precision ⊗oating-point 
   28  numbers, in the same way as the arithmetic operations 
   37  de_ned for integers in |¬m|¬i|¬x are applied 
   44  to these more general problems.|'!|9|4|1|1|1In 
   50  this section we shall study algorithms which 
   57  do operations (a), (b), and (c) above for integers 
   66  expressed in radix |εb |πnotation, where |εb 
   73  |πis any given integer|4|¬R|42. Thus the algorithms 
   80  are quite general de_nitions of arithmetic processes, 
   87  and as such they are unrelated to any particular 
   96  computer. But the discussion in this section 
  103  will also be somewhat machine-oriented, since 
  109  we are chie⊗y concerned with e∃cient methods 
  116  for doing high-precision calculations by computer. 
  122  Although our examples are based on the mythical 
  130  |¬m|¬i|¬x computer, essentially the same considerations 
  136  apply to nearly every other machine. For convenience, 
  144  let us assume _rst that we have a computer (like 
  154  |¬m|¬i|¬x) which uses the signed-magnitude representation 
  160  for numbers; suitable modi_cations for complement 
  166  notations are discussed near the end of this 
  174  section.|'!|9|4|1|1|1The most important fact 
  179  to understand about extended-precision numbers 
  184  is that they may be regarded as numbers written 
  193  in radix |εw |πnotation, where |εw |πis the computer's 
  202  word size. For example, an integer which _lls 
  210  10 words on a computer whose word size is |εw|4α=↓|410|g1|g0
  219   |πhas 100 decimal digits; but we will consider 
  228  it to be a 10-place number to the base 10|g1|g0. 
  238  This viewpoint is justi_ed for the same reason 
  246  that we may convert, say, from binary to octal 
  255  notation, simply by grouping the bits together. 
  262  (See Eq. 4.1<5.)|'!|9|4|1|1|1In these terms, 
  268  we are given the following primitive operations 
  275  to work with:|'{A12}{I1.7H}|4a|β0)|9addition 
  279  or subtraction of one-place integers, giving 
  285  a one-place answer and a carry;|'|1|1|1b|β0)|9multiplication
  291   of a one-place integer by another one-place 
  299  integer, giving a two-place answer;|'|4|1c|β0)|9division 
  305  of a two-place integer by a one-place integer, 
  313  provided that the quotient is a one-place integer, 
  321  and yielding also a one-place remainder.|'{IC}{A12}By 
  328  adjusting the word size, if necessary, nearly 
  335  all computers will have these three operations 
  342  available, and so we will construct our algorithms 
  350  (a), (b), and (c) mentioned above in terms of 
  359  the primitive operations (a|β0), (b|β0), and 
  365  (c|β0).|'!|9|4|1|1|1Since we are visualizing extended-precisi
  370  on integers as base |εb |πnumbers, it is sometimes 
  379  helpful to think of the situation when |εb|4α=↓|410, 
  387  |πand to imagine that we are doing the arithmetic 
  396  by hand. Then operation (a|β0) is analogous to 
  404  memorizing the addition table; (b|β0) is analogous 
  411  to memorizing the multiplication table; and (c|β0) 
  418  is essentially memorizing the multiplication 
  423  table in reverse. The more complicated operations 
  430  (a), (b), (c) on high-precision numbers can now 
  438  be done using the simple addition, subtraction, 
  445  multiplication, and long division procedures 
  450  we are taught in elementary school. In fact, 
  458  most of the algorithms we shall discuss in this 
  467  section are essentially only mechanizations of 
  473  familiar pencil-and-paper operations. Of course, 
  478  we must state the algorithms much more precisely 
  486  than they have ever been stated in the _fth grade, 
  496  and we should also attempt to minimize computer 
  504  memory and running time requirements.|'!|9|4|1|1|1To 
  510  avoid a tedious discussion and cumbersome notations, 
  517  let us assume that all numbers we deal with are 
  527  |εnonnegative. |πThe additional work of computing 
  533  the signs, etc., is quite straightforward, and 
  540  the reader will _nd it easy to _ll in any details 
  551  of this sort.|'!|9|4|1|1|1First comes addition, 
  557  which of course is very simple, but it is worth 
  567  studying since the same ideas occur in the other 
  576  algorithms also:|'{A12}|≡A|≡l|≡g|≡o|≡r|≡i|≡t|≡h|≡m 
  579  |≡A (|εAddition of nonnegative integers).|9|4|πGiven 
  584  nonnegative |εn-|πplace integers |εu|β1u|β2|4.|4.|4.|4u|βn 
  588  |πand |εv|β1v|β2|4.|4.|4.|4v|βn |πwith radix 
  592  |εb, |πthis algorithm forms their sum, (|εw|β0w|β1w|β2|4.|4.
  598  |4.|4w|βn)|βb. |π(Here |εw|β0 |πis the ``carry,'' 
  604  and it will always be equal to 0 or 1.)|'{A3}{I1.9H}|≡A|≡1|≡
  614  .|9[Initialize.] Set |εj|4|¬L|4n, k|4|¬L|40. 
  618  (|πThe variable |εj |πwill run through the various 
  626  digit positions, and the variable |εk |πkeeps 
  633  track of carries at each step.)|'{A3}|≡A|≡2|≡.|9[Add 
  640  digits.] Set |εw|βj|4|¬L|4(u|βj|4α+↓|4v|βj|4α+↓|4k)|πmod|4|ε
  642  b, |πand |εk|4|¬L|4|"l(u|βj|4α+↓|4v|βj|4α+↓|4k)/b|"L. 
  645  (|πIn other words, |εk |πis set to 1 or 0, depending 
  656  on whether a ``carry'' occurred or not, i.e., 
  664  whether |εu|βj|4α+↓|4v|βj|4α↓|4k|4|¬R|4b |πor 
  667  not. At most one carry is possible during the 
  676  two additions, since we always have|'{A9}|ε!!|1u|βj|4α+↓|4v|
  682  βj|4α+↓|4k|4|¬E|4(b|4α_↓|41)|4α+↓|4(b|4α_↓|41)|4α+↓|41|4|¬W|
  682  42b.)|;{A9}|π|≡A|≡3|≡.|9[Loop on |εj.] |πDecrease 
  687  |εj |πby one. Now if |εj|4|¬Q|40, |πgo back to 
  696  step A2; otherwise set |εw|β0|4|¬L|4k |πand terminate 
  703  the algorithm.|'{A12}{IC}For a formal proof that 
  710  Algorithm A is a valid, see exercise 4.|'!|9|4|1|1|1A 
  719  |¬m|¬i|¬x program for this addition process might 
  726  take the following form:|'{A12}|≡P|≡r|≡o|≡g|≡r|≡a|≡m 
  731  |≡A (|εAddition of nonnegative integers).|9|4|πLet 
  736  |¬l|¬o|¬c(|εu|βj)|4|"o|4|π|¬u|4α+↓|4|εj, |π|¬l|¬o|¬c(|εv|βj)
  737  |4|"o|4|π|¬v|4α+↓|4|εj, |π|¬l|¬o|¬c(|εw|βj)|4|"o|4|π|¬w|4α+↓
  738  |4|εj, |πrI1|4|¬o|4|εj, |πr|¬a|4|"o|4|εk, |πword 
  742  size|4|"o|4|εb, |π|¬n|4|"o|4|εn.|'{A12}{H9L11M24}|π|∂!!|∂!!|
  744  ∂!!!!|∂!!!!|∂!!!!!!|∂!!!!!!!!!!!!!!|∂|E|;|ε|*/|>
  746  |↔c|↔O|\|;|π|¬e|¬n|¬t|¬i|'|¬n|'1|;|ε|*/A|↔O|\.|9Initalize.|4j
  750  |4|¬L|4n.|'>|>|*/|↔c|↔P|\|;|π|¬j|¬o|¬v|'|¬o|¬f|¬l|¬o|'
  756  1|;|πEnsure|4over⊗ow|4is|4o=.|'>|ε|>|*/|↔c|↔L|\|;
  761  |π1|¬h|;|¬e|¬n|¬t|¬a|'|¬0|'|εN|4α+↓|41|4α_↓|4K|;
  765  k|4|¬L|40.|'>|>|*/|↔c|↔M|\|;|π|¬j|¬i|¬z|'3|¬f|'
  771  |εN|4α+↓|41|4α_↓|4K|;|πTo|4A3|4if|4|εj|4α=↓|40.|'
  773  >|>|*/|↔c|↔C|\|;|¬2|π|¬h|;|¬a|¬d|¬d|'|¬u,|¬1|'
  779  |εN|;|εA|*/|↔P|\.|9Add|4digits.|'>|>|*/|↔c|↔o|\|;
  784  |π|;|¬a|¬d|¬d|'|¬v,|¬1|'|εN|;>|>|*/|↔c|↔p|\|;|π|¬s|¬t|¬a|'
  792  |¬w,|¬1|'|εN|;>|>|*/|↔c|↔l|\|;|;|π|¬d|¬e|¬c|¬i|'
  799  |¬1|'|εN|;A|↔L.|9Loop|4on|4j.|'>|>|*/|↔c|↔m|\|;
  805  |;|π|¬j|¬n|¬o|¬v|'|¬1|¬b|'|εN|;|πIf|4no|4over⊗ow,|4set|4|εk|
  809  4|¬L|40.|'>|>|*/|↔c|↔O|\|;|;|π|¬e|¬n|¬t|¬a|'|¬1|'
  816  |εK|;|πOtherwise,|4set|4|εk|4|¬L|41.|'>|>|*/|↔O|↔O|\|;
  821  |;|π|¬j|¬1|¬p|'|¬2|¬b|'|εK|;|πTo|4A2|4if|4|εj|4|=|↔6α=↓|40.|
  825  '>|>|*/|↔O|↔P|\|;|π|¬3|¬h|;|¬s|¬t|¬a|'|¬w|'1|;
  833  |πStore|4_nal|4carry|4in|4|εw|β0.|'>{A12}{H10L12M29}|πThe 
  836  running time for this program is 10|εN|4α+↓|46 
  843  |πcycles, independent of the number of carries, 
  850  |εK. |πThe quantity |εK |πis analyzed in detail 
  858  at the close of this section.|'!|9|4|1|1|1Many 
  865  modi_cations of Algorithm A are possible, and 
  872  only a few of these are mentioned in the exercises 
  882  below. A chapter on generalizations of this algorithm 
  890  might be entitled, ``How to design adding circuits 
  898  for a digital computer.''|'!|9|4|1|1|1The problem 
  904  of subtraction is similar to addition, but the 
  912  di=erences are worth noting:|'{A12}|≡A|≡l|≡g|≡o|≡r|≡i|≡t|≡h|
  916  ≡m |≡S (|εSubtraction of nonnegative integers).|9|4|π|πGiven
  921   nonnegative |εn-|πplace integers |εu|β1u|β2|4.|4.|4.|4u|βn|
  925  4|¬R|4v|β1v|β2|4.|4.|4.|4v|βn |πwith radix |εb, 
  929  |πthis algorithm forms their nonnegative di=erence, 
  935  (|εw|β1w|β2|4.|4.|4.|4w|βn)|βb.|'{A3}{I1.7H}|π|≡S|≡1|≡.|9[In
  936  itialize.]|9Set |εj|4|¬L|4n, k|4|¬L|40.|'{A3}|π|≡S|≡2|≡.|9[S
  939  ubtract digits.]|9Set |εw|βj|4|¬L|4(u|βj|4α_↓|4v|βj|4α+↓|4k)
  941  |πmod |εb, |πand |εk|4|¬L|4|"l(u|βj|4α_↓|4v|βj|4α+↓|4k)/b|"L
  944  . (|πIn other words, |εk |πis set to |→α_↓1 or 
  954  0, depending on whether a ``borrow'' occurred 
  961  or not, i.e., whether |εu|βj|4α_↓|4v|βj|4α+↓|4k|4|¬W|40 
  966  |πor not. In the calculation of |εw|βj |πnote 
  974  that we must have |→α_↓|εb|4α=↓|40|4α_↓|4(b|4α_↓|41)|4α↓|4(|
  978  →α_↓1)|4|¬E|4u|βj|4α_↓|4v|βj|4α↓|4k|4|¬E|4(b|4α_↓|41)|4α_↓|4
  978  0|4α⊗↓|40|4|¬W|4b; |πhence 0|4|¬E|4|εu|βj|4α_↓|4v|βj|4α+↓|4k
  980  |4α+↓|4b|4|¬W|42b, |πand this suggests the method 
  986  of computer implementation explained below.)|'
  991  {A3}|≡S|≡3|≡.|9[Loop on |εj.] |πDecrease |εj 
  996  |πby one. Now if |εj|4|¬Q|40, |πgo back to step 
 1005  S2; otherwise terminate the algorithm. (When 
 1011  the algorithm terminates, we should have |εk|4α=↓|40; 
 1018  |πthe condition |εk|4α=↓|4|→α_↓1 |πwill occur 
 1023  if and only if |εv|β1|4.|4.|4.|4v|βn|4|¬Q|4u|β1|4.|4.|4.|4u|
 1027  βn, |πand this is contrary to the given assumptions. 
 1036  See exercise 12.)|'{A12}{IC}!|9|4|1|1|1In a |¬m|¬i|¬x 
 1042  program to implement subtraction, it is most 
 1049  convenient to retain the value 1|4α↓|4|εk |πinstead 
 1056  of |εk |πthroughout the algorithm, so that we 
 1064  can calculate |εu|βj|4α_↓|4v|βj|4α+↓|4(1|4α+↓|4k)|4α+↓|4(b|4
 1066  α_↓|41) |πin step S2. (Recall that |εb |πis the 
 1075  word size.) This is illustrated in the following 
 1083  code:|'{A12}|≡P|≡r|≡o|≡g|≡r|≡a|≡m |≡S (|εSubtraction 
 1087  of nonnegative integers).|9|4|πThis program is 
 1092  analogous to Program A; we have rI1|4|"o|4|εj, 
 1099  |πrA|4|"o|41|4α+↓|4|εk. |πHere, as in other programs 
 1105  of this section, location |¬w|¬m|¬1 word; cf. 
 1112  Program 4.2.3D, lines 38<3|>|ε|*/|↔c|↔O|\|;|;|π|¬e|¬n|¬t|¬i|'
 1119  |¬n|'|¬1|;|εS|*/|↔O|\.|9Initialize.|4j|4|¬L|4n.|'
 1122  >|>|*/|↔c|↔P|\|;|;|π|¬j|¬o|¬v|'|¬o|¬f|¬l|¬o|'|¬1|;
 1129  Ensure|4over⊗ow|4is|4o=.|'>|ε|>|*/|↔c|↔L|\|;|π|¬1|¬h|;
 1134  |¬j|¬i|¬z|'|¬d|¬o|¬n|¬e|'|εK|4α+↓|41|;|πTerminate|4if|4|εj|4
 1137  α=↓|40.|'>|>|*/|↔c|↔M|\|;|;|π|¬e|¬n|¬t|¬a|'|¬1|'
 1144  |εK|;|πSet|4|εk|4|¬L|40.|'>|>|*/|↔c|↔C|\|;|π|¬2|¬h|;
 1150  |¬a|¬d|¬d|'|¬u|¬,|¬1|'|εN|;S|*/|↔P|\.|9Subtract|4digits.|'
 1154  >|>|*/|↔c|↔o|\|;|;|π|¬s|¬u|¬b|'|¬v|¬,|¬1|'|εN|;
 1161  |πCompute|4|εu|βj|4α_↓|4v|βj|4α+↓|4k|4α+↓|4b.|'
 1162  >|>|*/|↔c|↔p|\|;|;|π|¬a|¬d|¬d|'|¬w|¬m|¬1|'|εN|;
 1169  >|>|*/|↔c|↔l|\|;|;|π|¬s|¬t|¬a|'|¬w|¬,|¬1|'|εN|;
 1176  |π(May|4be|4minus|4zero.)|'>|>|*/|↔c|↔m|\|;|;|π|¬d|¬e|¬c|¬1|'
 1182  |¬1|'|εN|;S|*/|↔L|\.|9Loop|4on|4j.|'>|>|*/|↔O|↔c|\|;
 1188  |;|π|¬j|¬o|¬v|'|¬1|¬b|'|εN|;|πIf|4over⊗ow,|4set|4|εk|4|¬L|40
 1192  .|'>|>|*/|↔O|↔O|\|;|;|π|¬e|¬n|¬t|¬a|'|¬0|'|εN|4α_↓|4K|;
 1200  |πOtherwise,|4set|4|εk|4|¬L|4|→α_↓1.|'>|>|*/|↔O|↔P|\|;
 1204  |;|π|¬j|¬i|¬p|'|¬2|¬b|'|εN|4α_↓|4K|;|πBack|4to|4S2.|'
 1209  >|>|*/|↔O|↔L|\|;|;*?*?*?*?{U0}{H9L11M29}|πW58320#Computer 
folio 347 galley 2
 1214  Programming!(Knuth/Addision-Wesley)!f.347!Ch.4!G.2b.|'
 1215  {A20}{H10L12M29}The running time for this program 
 1221  is 12|εN|4α+↓|43 |πcycles, which is slightly 
 1227  longer than that for Program A.|'!|9|4|1|1|1The 
 1234  reader may wonder if it would not be worth while 
 1244  to have a combined addition-subtraction routine 
 1250  in place of the two algorithms A and S. Study 
 1260  of the computer programs shows that it is generally 
 1269  better to use two di=erent routines, so that 
 1277  the inner loop of the computation can be performed 
 1286  as rapidly as possible, since the programs are 
 1294  so short.|'!|9|4|1|1|1Our next problem is multiplication, 
 1301  and here we carry the ideas used in Algorithm 
 1310  A a little further:|'{A12}|≡A|≡l|≡g|≡o|≡r|≡i|≡t|≡h|≡m 
 1315  |≡M (|εMultiplication of nonnegative integers).|9|4|πGiven 
 1320  nonnegative integers |εu|β1u|β2|4.|4.|4.|4u|βn 
 1323  |πand |εv|β1v|β2|4.|4.|4.|4v|βm |πwith radix 
 1327  |εb, |πthis algorithm forms their product |ε(w|β1w|β2|4.|4.|
 1333  4.|4w|βm|βα+↓|βn)|βb. (|al products (|εu|β1u|β2|4.|4.|4.|4u|
 1336  βn)|4α⊗↓|4v|βj |π_rst, for 1|4|¬E|4|εj|4|¬E|4m, 
 1340  |πand then adding these products together with 
 1347  appropriate scale factors; but in a computer 
 1354  it is best to do the addition concurrently with 
 1363  the multiplication, as described in this algorithm.)|'
 1370  {A3}{I1.10H}|≡M|≡1|≡.|9[Initialize.] Set |εw|βm|βα+↓|β1, 
 1373  w|βm|βα+↓|β2,|4.|4.|4.|4,|4w|βm|βα+↓|βn |πall 
 1375  to zero. Set |εj|4|¬L|4m. (|πIf |εw|βm|βα+↓|β1,|4.|4.|4.|4,|
 1380  4w|βm|βα+↓|βn |πwere not cleared to zero in this 
 1388  step, we would have a more general algorithm 
 1396  which sets|'{A9}|ε!!|1|1|1(w|β1|4.|4.|4.|4w|βm|βα+↓|βn)|4|¬L
 1398  |4(u|β1|4.|4.|4.|4u|βn)|4α⊗↓|4(v|β1|4.|4.|4.|4v|βm)|4α+↓|4{H
 1398  12}({H10}w|βm|βα+↓|β1|4.|4.|4.|4w|βm|βα+↓|βn).{H12}){H10}|;
 1399  {A9}|π|≡M|≡2|≡.|9[Zero multiplier?] If |εv|βj|4α=↓|40, 
 1403  |πset |εw|βj|4|¬L|40 |πand go to step M6. (This 
 1411  test saves a good deal of time if there is a 
 1422  reasonable chance that |εv|βj |πis zero, but 
 1429  otherwise it may be omitted without a=ecting 
 1436  the validity of the algorithm.)|'{A3}|≡M|≡3|≡.|9[Initialize 
 1442  |εi.] |πSet |εi|4|¬L|4n, k|4|¬L|40.|'{A3}|π|≡M|≡4|≡.|9[Multi
 1446  ply and add.] Set |εt|4|¬L|4u|βi|4α⊗↓|4v|βj|4α+↓|4w|βi|βα+↓|
 1450  βj|4α+↓|4k; |πthen set |εw|βi|βα+↓|βj|4|¬L|4t 
 1454  |πmod |εb, k|4|¬L|4|"lt/b|"L. (|πHere the ``carry'' 
 1460  |εk |πwill always be in the range 0|4|¬E|4|εk|4|¬W|4b; 
 1468  |πsee below.)|'{A3}|≡M|≡5|≡.|9[|πLoop on |εi]. 
 1473  |πDecrease |εi |πby one. Now if |εi|4|¬Q|40, 
 1480  |πgo back to step M4; otherwise set |εw|βj|4|¬L|4k.|'
 1488  {A3}|π|≡M|≡6|≡.|9[Loop on |εj.] |πDecrease |εj 
 1493  |πby one. Now if |εj|4|¬Q|40, |πgo back to step 
 1502  M2; otherwise the algorithm terminates.|'{A12}{IC}!|9|4|1|1|
 1507  1Algorithm M is illusytr*?{A12}{IC}!|9|4|1|1|1Algorithm 
 1511  M is illustrated in Table 1, assuming that |εb|4α=↓|410, 
 1520  |πby showing the states of the computation at 
 1528  the beginning of steps M5 and M6. A proof of 
 1538  Algorithm M appears in the answer to exercise 
 1546  14.|'!|9|4|1|1|1The two inequalities|'{A9}|ε0|4|¬E|4t|4|¬E|4
 1550  b|g2,!!0|4|¬E|4k|4|¬W|4b|J!(1)|;{A9}|πare crucial 
 1553  for an e∃cient implementation of this algorithm, 
 1560  since they point out how large a register is 
 1569  needed for the computations. These inequalities 
 1575  may be proved by induction as the algorithm proceeds, 
 1584  for if we have |εk|4|¬W|4b |πat the start of 
 1593  step M4, we have|'{A9}|εu|βi|4α⊗↓|4v|βj|4α+↓|4w|βi|βα+↓|βj|4
 1597  α+↓|4k|4|¬E|4(b|4α_↓|41)|4α⊗↓|4(b|4α_↓|41)|4α+↓|4(b|4α_↓|41)
 1597  |4α+↓|4(b|4α_↓|41)|4α=↓|4b|g2|4α_↓|41|4|¬W|4b|g2.|;
 1598  {A12}{H9L11M15}{H8L10}|π|∨T|∨a|∨b|∨l|∨e|4|4|∨1|;
 1599  {A3}{H9L11}MULTIPLICATION OF 914 BY 84.|;{A6}{H9L11M15}|∂!!!
 1604  !|9|∂!|9|∂!|9|∂!|9|∂!|9|∂!!|∂!|9|∂!|9|∂!|9|∂!|9|∂!|9|∂|E|;
 1605  |π|>Step|;|εi|;j|;u|βi|;t|;w|β1|;w|β2|;w|β3|;
 1614  w|β4|;w|β5|;>|π|>M5|;3|;2|;4|;4|;16|;|εx|;x|;
 1626  0|;0|;6|;>|π|>M5|;2|;2|;1|;4|;05|;|εx|;x|;0|;
 1640  5|;6|;>|π|>M5|;1|;2|;9|;4|;36|;|εx|;x|;6|;5|;
 1654  6|;>|π|>M6|;0|;2|;|εx|;4|;36|;|εx|;3|;6|;5|;6|;
 1668  >|π|>M5|;3|;1|;4|;8|;37|;|εx|;3|;6|;7|;6|;>|π|>
 1683  M5|;2|;1|;1|;8|;17|;|εx|;3|;7|;7|;6|;>|π|>M5|;
 1697  1|;1|;9|;8|;76|;|εx|;6|;7|;7|;6|;>|π|>M6|;0|;
 1711  1|;|εx|;8|;76|;7|;6|;7|;7|;6|;>{A12}|π{H10L12M29}!|9|4|1|1|1
 1721  The following |¬m|¬i|¬x program shows the considerations 
 1728  which are necessary when Algorithm M is implemented 
 1736  on a computer. The coding for step M4 would be 
 1746  a little simpler if our computer had a ``multiply-nad-add'' 
 1755  instruction, or if it had a double-length accumulator 
 1763  for addition.|'{A12}|≡P|≡r|≡o|≡g|≡r|≡a|≡m |≡M|≡. 
 1767  (|εMultiplication of nonnegative integers).|9|4|πThis 
 1771  program is annalogous to Program A. rI1|4|"o|4|εi, 
 1778  |πrI2|4|"o|4|εi|4α+↓|4j, |π|¬c|¬o|¬n|¬t|¬e|¬n|¬t|¬s|¬(|¬c|¬a
 1779  |¬r|¬r|¬y|¬)|4|"o|4|εk.|'{A12}|π{H9L11M29}|π|∂!!|∂!!|∂!!!!|∂
 1780  !!!!!|∂!!!!!!|∂!!!!!!!!!!!!!!!!!!!|4|4|4|∂|E|;
 1781  |ε|>|*/|↔c|↔O|\|;|π|;|¬e|¬n|¬t|¬1|'|¬n|'1|;|εM|*/|↔O|\.|9Initi
 1787  alize.|'>|>|*/|↔c|↔P|\|;|;|π|¬j|¬o|¬v|'|¬o|¬f|¬l|¬o|'
 1794  1|;|πEnsure|4over⊗ow|4is|4o=.|'>|ε|>|*/|↔c|↔L|\|;
 1799  |;|π|¬s|¬t|¬z|'|¬wα+↓|¬m|¬,|¬1|'|εN|;w|βm|βα+↓|βi|4|¬L|40.|'
 1804  >|>|*/|↔c|↔M|\|;|π|¬d|¬e|¬c|¬1|'|¬1|'|εN|;>|>|*/|↔c|↔C|\|;
 1813  |;|π|¬j|¬1|¬p|'|≤∩|→α_↓|¬2|'|εN|;|πRepeat|4for|4|εn|4|¬R|4i|
 1817  4|¬Q|40.|'>|>|*/|↔c|↔o|\|;|π|;|¬e|¬n|¬t|¬2|'|¬m|'
 1824  1|;|εj|4|¬L|4m.|'>|>|*/|↔c|↔p|\|;|π|¬1|¬h|'|¬l|¬d|¬x|'
 1831  |¬v|¬,|¬2|'|εM|;M|*/|↔P|\.|9Zero|4multiplier?|'
 1834  >|>|*/|↔c|↔l|\|;|;|π|¬j|¬x|¬z|'|¬8|¬f|'|εM|;|πIf|4|εv|βj|4α=↓
 1841  |40,|4|πset|4|εw|βj|4|¬L|40|4|πand|4go|4to|4M6.|'
 1842  >|>|*/|↔c|↔m|\|;|;|π|¬e|¬n|¬t|¬1|'|¬n|'|εM|4α_↓|4Z|;
 1849  M|*/|↔L|\.|9Initialize|4i.|'>|>|*/|↔O|↔c|\|;|;|π|¬e|¬n|¬t|¬3|'
 1855  |¬n|¬,|¬2|'|εM|4α_↓|4Z|;i|4|¬L|4n,|4(i|4α+↓|4j)|4|¬L|4n|4α+↓
 1857  |4j.|'>|>|*/|↔O|↔O|\|;|;|π|¬e|¬n|¬t|¬x|'|¬0|'|εM|4α_↓|4Z|;
 1865  k|4|¬L|40.|'>|>|*/|↔O|↔P|\|;|¬2|¬h|;|π|¬s|¬t|¬x|'
 1871  |¬c|¬a|¬r|¬r|¬y|'|ε(M|4α_↓|4Z)N|;M|*/|↔M|\.|9Multiply|4and|4a
 1873  dd.|'>|>|*/|↔O|↔L|\|;|;|π|¬l|¬d|¬a|'|¬u|¬,|¬1|'
 1880  |ε(M|4α_↓|4Z)N|;u|βi|'>|>|*/|↔O|↔M|\|;|π|¬m|¬u|¬l|'
 1886  |¬v|¬,|¬2|'|ε(M|4α_↓|4Z)N|;α⊗↓|4v|βj|'>|>|*/|↔O|↔C|\|;
 1892  |;|π|¬s|¬l|¬c|'|¬5|'|ε(M|4α_↓|4Z)N|;|πInterchange|4rA|4|"m|4
 1896  rX.|'>|ε|>|*/|↔O|↔o|\|;|;|π|¬a|¬d|¬d|'|¬w|¬,|¬3|'
 1903  |ε(M|4α_↓|4Z)N|;|πAdd|4|εw|βi|βα+↓|βj|4|πto|4lower|4half.|'
 1905  >|>|*/|↔O|↔p|\|;|;|π|¬j|¬n|¬o|¬v|'{J3}|≤∩|→α↓2|'
 1911  (|εM|4α_↓|4Z)N|;|πDid|4over⊗ow|4occur?|'>|ε|>
 1915  |*/|↔O|↔l|\|;|;|π|¬i|¬n|¬c|¬x|'|¬1|'|εK|;|πIf|4so,|4carry|4on
 1920  e|4into|4upper|4half.|'>|ε|>|*/|↔O|↔m|\|;|;|π|¬a|¬d|¬d|'
 1926  |¬c|¬a|¬r|¬r|¬y|'|ε(M|4α_↓|4Z)N|;|πAdd|4|εk|4|πto|4lower|4ha
 1928  lf.|'>|ε|>|*/|↔P|↔c|\|;|;|π|¬j|¬n|¬o|¬v|'{J3}|≤∩|→α+↓|¬2|'
 1935  |ε(M|4α_↓|4Z)N|;|πDid|4over⊗ow|4occur?|'>|ε|>
 1939  |*/|↔P|↔O|\|;|;|π|¬i|¬n|¬c|¬x|'|¬1|'|εK|¬S|;|πIf|4so,|4carry|
 1944  4one|4into|4upper|4half.|'>|ε|>|*/|↔P|↔P|\|;|;
 1949  |π|¬s|¬t|¬a|'|¬w|¬,|¬3|'|ε(M|4α_↓|4Z)N|;|εw|βi|βα+↓|βj|4|¬L|
 1952  4t|4|πmod|4|εb.|'>|>|*/|↔P|↔L|\|;|;|π|¬d|¬e|¬c|¬1|'
 1958  |¬1|'|ε(M|4α_↓|4Z)N|;M|*/|↔C|\.|9Loop|4on|4i.|'
 1961  >|>|*/|↔P|↔M|\|;|;|π|¬d|¬e|¬c|¬3|'|¬1|'|ε(M|4α_↓|4Z)N|;
 1968  |πDecrease|4|εi|4|πand|4(|εi|4α+↓|4j)|4|πby|4one.|'
 1969  >|ε|>|*/|↔P|↔C|\|;|;|π|¬j|¬1|¬p|'|¬2|¬b|'|ε(M|4α_↓|4Z)N|;
 1976  |πBack|4to|4M4|4if|4|εi|4|¬Q|40;|4|πrX|4α=↓|4|"l|εt/b|¬L.|'
 1977  >|ε|>|*/|↔P|↔o|\|;|π|¬8|¬h|;|¬s|¬t|¬x|'|¬w|¬,|¬2|'
 1983  |εM|;|πSet|4|εw|βj|4|¬L|4k.|'>|π|ε|>|*/|↔P|↔p|\|;
 1988  |;|π|¬d|¬e|¬c|¬2|'|¬1|'|εM|;M|*/|↔o|\.|9Loop|4on|4j.|'
 1993  >|>|*/|↔P|↔l|\|;|;|π|¬j|¬2|¬p|'|¬1|¬b|'|εM|;*?|πRepeat|4until|
 2000  4|εj|4α=↓|40.|'>{A12}{H10L12M29}|πThe execution 
 2004  time of Program M depends on the number of places, 
 2014  |εM, |πin the multiplier; the number of places, 
 2022  |εN, |πin the multiplicand; the number of zeros, 
 2030  |εZ, |πin the multiplier; and the number of carries, 
 2039  |εK |πand |εK|¬S |πwhich occur during the addition 
 2047  to the lower half of the product in the computation 
 2057  of |εt. |πIf we approximate both |εK |πand |εK|¬S 
 2066  |πby the reasonable (although somewhat pessimistic) 
 2072  values |f1|d32|)(|εM|4α_↓|4Z)N, |πwe _nd that 
 2077  the total running time comes to 28|εMN|4α+↓|410M|4α+↓|44N|4α
 2083  +↓|43|4α_↓|4Z(28N|4α+↓|43) |πcycles. If step 
 2087  M2 were deleted, the running time would be 28|εMN|4α+↓|47M|4
 2095  α+↓|44N|4α+↓|43 |πcycles, so this step is not 
 2102  advantageous unless the density of zero positions 
 2109  within the multiplier is |εZ/M|4|¬Q|43/(28N|4α+↓|43). 
 2114  |πIf the multiplier is chosen completely at random, 
 2122  this ratio |εZ/M |πis expected to be only about 
 2131  1/|εb, |πwhich is extremely small; so step M2 
 2139  is generally |εnot |πworth while.|'!|9|4|1|1|1Algorithm 
 2145  M is not the fastest way to multiply when |εm 
 2155  |πand |εn |πare large, although it has the advantage 
 2164  of simplicity. Speedier methods are discussed 
 2170  in Section 4.3.3; even when |εm|4α=↓|4n|4α=↓|44, 
 2176  |πit is possible to multiply numbers in a little 
 2185  less time than is required by Algorithm M.|'{A12}!|9|4|1|1|1
 2193  The _nal algorithm of concern to us in this section 
 2203  is long division, in which we want to divide 
 2212  (|εn|4α+↓|4m)-|πplace integers by |εn-|πplace 
 2216  integers. Here the ordinary pencil-and-paper 
 2221  method involves a certain amount of guesswork 
 2228  and ingenuity on the part of the person doing 
 2237  the division; we must either eliminate this guesswork 
 2245  from the algorithm or develop some theory to 
 2253  explain it more carefully.|'!|9|4|1|1|1A moment's 
 2259  re⊗ection about the ordinary process of long 
 2266  division shows that the general problem breaks 
 2273  down into simpler steps, each of which is the 
 2282  division of an (|εn|4α+↓|41)-|πplace number |εu 
 2288  |πby the |εn-|πplace divisor |εv, |πwhere 0|4|¬E|4|εu/v|4|¬W
 2294  |4b; |πthe remainder |εr |πafter each step is 
 2302  less than |εv, |πso we may use |εrb|4α+↓|4(|πnext 
 2310  place of dividend) as the new |εu |πin the succeeding 
 2320  step. For example, if we are asked to divide 
 2329  3142 by 47, we _rst divide 314 by 47, gbe*?*?*?are 
 2339  asked to divide 3142 by 47, we _rst divide 314 
 2349  by 47, getting 6 and a remainder of 32; then 
 2359  we divide 322 by 47, getting 6 and a remainder 
 2369  of 40; thus we have a quotient of 66 and a remainder 
 2381  of 40. It is clear that this same idea works 
 2391  in general, and so our search for an appropriate 
 2400  division algorithm reduces to the following problem 
 2407  (Fig. 6);|'{A12}|ε!|9|4|1|1|1Let u|4α=↓|4u|β0u|β1|4.|4.|4.|4
 2410  u|βn and v|4α=↓|4v|β1v|β2|4.|4.|4.|4v|βn be nonnegative 
 2415  integers in radix b notation, such that u/v|4|¬W|4b. 
 2423  Find an algorithm to determine q|4α=↓|4|"lu/v|"L.|'
 2429  {A6}{H9L11}|π|≡F|≡i|≡g|≡. |≡6|≡.|9|4Wanted: a 
 2432  way to determine |εq |πrapidly.|{U0}{H9L11M29}|πW58320#Compu
folio 350 galley 3 WARNING: Much of this tape unreadable!
 2436  ter Programming!(Knuth/Addision-Wesley)!f.350!Ch.4!g.3b.|'
 2438  {A20}{H10L12M29}|πWe may observe that the condition 
 2444  |εu/v|4|¬W|4b |πis equivalent to the condition 
 2450  that |εu/b|4|¬Q|4v; |πi.e., |"l|εu/b|"L|4|¬W|4v; 
 2454  |πand this is the condition that |εu|β0u|β1|4.|4.|4.|4u|βn|β
 2460  α_↓|β1|4|¬W|4v|β1v|β2|4.|4.|4.|4v|βn. |πFurthermore, 
 2462  if we write |εr|4α=↓|4u|4α_↓|4qv, |πthen |εq 
 2468  |πis the unique integer such that 0|4|¬E|4|εr|4|¬E|4v.|'
 2475  |π!|9|4|1|1|1The most obvious approach to this 
 2481  problem is to make a guess about |εq, |πbased 
 2490  on the most signi_cant digits of |εu |πand |εv. 
 2499  |πIt isn't obvious that such a method will be 
 2508  reliable enough, but it is worth investigating; 
 2515  let us therefore set|'{A9}|ε|=7q|4α=↓|4|πmin|↔a|↔d|ε|(u|β0b|
 2519  4α+↓|4u|β1|d2v|β1|)|↔f,|4b|4α_↓|41|↔s.|J!(2)|;
 2520  {A9}|πThis {A9}|πThis formula says |ε|=7q |πis 
 2526  obtained by dividing the two leading digits of 
 2534  |εu |πby the leading digit of |εv; |πand if the 
 2544  result is |εb |πor more we can replace it by 
 2554  (|εb|4α_↓|41).|'!|9|4|1|1|1|πIt is a remarkable 
 2559  fact, which we will now investigate, that this 
 2567  value |ε|=7q |πis always a very good approximation 
 2575  to the desired answer |εq, |πso long as |εv|β1 
 2584  |πis reasonably large. In order to analyze how 
 2592  close |ε|=7q |πcomes to |εq, |πwe will _rst prove 
 2601  that |ε|=7q |πis never too small.|'{A12}|≡T|≡h|≡e|≡o|≡r|≡e|≡
 2607  m |≡A|≡.|9|4|εIn the notation above, |=7q|4|¬R|4q.|'
 2613  {A12}|π|εProof.|9|4|πSince |εq|4|¬E|4b|4α_↓|41, 
 2615  |πthe theorem is certainly true if |ε|=7q|4α=↓|4b|4α_↓|41. 
 2622  |πSuppose therefore that |ε|=7q|4|¬W|4b|4α_↓|41; 
 2626  |πit follows that |ε|=7q|4α=↓|4|"l(u|β0b|4α+↓|4u|β1)/v|β1|"L
 2629  , |πhence |ε|=7qv|β1|4|¬R|4u|β0b|4α+↓|4u|β1|4α_↓|4v|β1|4α+↓|
 2631  41. |πTherefore|'{A9}|ε|h|εu|4α_↓|4qv|4|¬E|4u|4α_↓|4qv|β1b|g
 2633  n|gα_↓|g1|4|∂|¬E|4u|β2b|gn|gα_↓|g2|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|
 2633  4u|βn|4α_↓|4b|gn|gα_↓|g1|4α+↓|4v|β1b|gn|gα_↓|g1|4|¬W|4v|β1b|
 2633  gn|gα_↓|g1|4|¬E|4v.|E|n|;| u|4α_↓|4|=7qv|4|¬E|4u|4α_↓|4|=7qv
 2634  |β1b|gn|gα_↓|g1|4|L|¬E|4u|β0b|gn|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4u
 2634  |βn>{A4}|L|4|9|1|4|1|1|1|4α_↓|4(u|β0b|gn|4α+↓|4u|β1b|gn|gα_↓
 2635  |g1|4α_↓|4v|β1b|gn|gα_↓|g1|4α+↓|4b|gn|gα_↓|g1)>
 2636  {A4}|L|4α=↓|4u|β2b|gn|gα_↓|g2|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4u|βn
 2636  |4α_↓|4b|gn|gα_↓|g1|4α+↓|4v|l1b|gn|gα_↓|g1|4|¬W|4v|β1b|gn|gα
 2636  _↓|g1|4|¬E|4v.>{A9}|πSince |εu|4α_↓|4|=7qv|4|¬W|4v, 
 2639  |πwe must have |ε|=7q|4|¬R|4q.|'{A12}|π!|9|4|1|1|1We 
 2644  will now prove that |ε|=7q |πcannot be much larger 
 2653  than |εq |πin practical situations. Assume that 
 2660  |ε|=7q|4|¬R|4q|4α↓|43. |πWe have|'{A9}|ε|=7q|4|¬E|4|(u|β0b|4
 2663  α+↓|4u|β1|d2v|β1|)|4α=↓|4|(u|β0b|gn|4α+↓|4u|β1b|gn|gα_↓|g1|d
 2663  2v|β1b|gn|gα_↓|g1|)|4|¬E|4|(u|d2v|β1b|gn|gα_↓|g1|)|4|¬W|4|(u
 2663  |d2v|4α_↓|4b|gn|gα_↓|g1|).|;{A9}|π(The case |εv|4α=↓|4b|gn|g
 2666  α_↓|g1 |πis impossible, for if |εv|4α=↓|4(100|4|¬O|4|¬O|4|¬O
 2671  |40)|βb |πthen |εq|4α=↓|4|=7q.) |πFurthermore, 
 2675  since |εq|4|¬Q|4(u/v)|4α_↓|41,|'{A9}|ε3|4|¬E|4|=7q|4α_↓|4q|4
 2677  |¬W|4|(u|d2v|4α_↓|4b|gn|gα_↓|g1|)|4α_↓|4|(u|d2v|)|4α+↓|41|4α
 2677  =↓|4|(u|d2v|)|4|↔a|(b|gn|gα_↓|g1|d2v|4α_↓|4b|gn|gα_↓|g1|)|↔s
 2677  |4α+↓|41.|;{A9}|πTherefore|'{A9}|ε|(u|d2v|)|4|¬Q|42|4|↔a|(v|
 2679  4α_↓|4b|gn|gα_↓|g1|d2b|gn|gα_↓|g1|)|↔s|4|¬R|42(v|β1|4α_↓|41)
 2679  .|;{A9}|πFinally, since |εb|4α_↓|44|4|¬R|4|=7q|4α_↓|43|4|¬R|
 2682  4q|4α=↓|4|"lu/v|"L|4|¬R|42(v|β1|4α_↓|41), |πwe 
 2684  have |εv|β1|4|¬W|4|"lb/2|"L. |πThis proves Theorem 
 2689  B:|'{A12}|≡T|≡h|≡e|≡o|≡r|≡e|≡m |≡B|≡.|9|4|εIf 
 2692  v|β1|4|¬R|4|"lb/2|"L, then |=7q|4α_↓|42|4|¬E|4q|4|¬E|4|=7q.|
 2694  '{A12}|π!|9|4|1|1|1The most important part of 
 2700  this theorem is that |εthe conclusion is independent 
 2708  of b; |πno matter how large |εb |πis, the trial 
 2718  quotient |ε|=7q |πwill never be more than 2 in 
 2727  error*3|'!|9|4|1|1|1The condition that |εv|β1|4|¬R|4|"lb/2|"L
 2731   |πis very much like a normalization condition 
 2739  (in fact, it is exactly the condition of normalization 
 2748  in a binary computer). One simple way to ensure 
 2757  that |εv|β1 |πis su∃ciently large is to multiply 
 2765  |εboth u |πand |εv |πby |"l|εb/(v|β1|4α+↓|41)|"L; 
 2771  |πthat does not change the value of |εu/v, |πnor 
 2780  does it increase the number of places in |εv, 
 2789  |πand exercise 23 proves that it will always 
 2797  make the new value of |εv|β1 |πlarge enough. 
 2805  (|εNote|*/: |\|πFor another way to normalize the 
 2812  divisor, see exercise 28.)|'!|9|4|1|1|1Now that 
 2818  we have armed ourselves with all of these facts, 
 2827  we are in a position to write the desired long 
 2837  division algorithm. This algorithm uses a slightly 
 2844  improved choice of |ε|=7q |πin step D3 which 
 2852  guarantees that |εq|4α=↓|4|=7q |πor |ε|=7q|4α_↓|41; 
 2857  |πin fact, the improved choice of |ε|=7q |πmade 
 2865  here is almost always accurate.|'{A12}|≡A|≡l|≡g|≡o|≡r|≡i|≡t|
 2870  ≡h|≡m |≡D (|εDivision of nonnegative integers).|9|4|πGiven 
 2876  nonnegative integers |εu|4α=↓|4u|β1u|β2|4.|4.|4.|4u|βm|βα+↓|
 2878  βn |πand |εv|4α=↓|4v|β1v|β2|4.|4.|4.|4v|βn |πwith 
 2882  radix |εb, |πwhere |εv|β1|4|=|↔6α=↓|40 |πand 
 2887  |εn|4|¬W|41, |πwe form the quotient |"l|εu/v|"L|4α=↓|4(q|β0q
 2892  |β1|4.|4.|4.|4q|βm)|βb |πand the remainder |εu 
 2897  |πmod |εv|4α=↓|4(r|β1r|β2|4.|4.|4.|4r|βn)|βb. 
 2899  (|πThis notation is slightly di=erent from that 
 2906  used in the above proofs. When |εn|4α=↓|41, |πthe 
 2914  simpler algorithm of exercise 16 should be used.)|'
 2922  {A3}{I1.9H}|≡D|≡1|≡.|9[Normalize.] |πSet |εd|4|¬L|4|"lb/(v|β
 2924  1|4α+↓|41)|"L. |πSet |εu|β0u|β1u|β2|4.|4.|4.|4u|βm|βα+↓|βn 
 2927  |πequal to |εu|β1u|β2|4.|4.|4.|4u|βm|βα+↓|βn 
 2930  |πtimes |εd. |πSet |εv|β1v|β2|4.|4.|4.|4v|βn 
 2934  |πequal to |εv|β1v|β2|4.|4.|4.|4v|βn |πtimes 
 2938  |εd. (|πNote the introduction of the new digit 
 2946  position |εu|β0 |πat the left of |εu|β1; |πif 
 2954  |εd|4α=↓|41, |πall we need to do in this step 
 2963  is to set |εu|β0|4|¬L|40. |πOn a binary computer 
 2971  it may be preferable to choose |εd |πto be a 
 2981  power of 2 instead of using the value suggested 
 2990  here; any value of |εd |πwhich results in |εv|β1|4|¬R|4|"lb/
 2998  2|"L |πwill su∃ce here.)|'{A3}|≡D|≡2|≡.|9[Initialize 
 3003  |εj.] |πSet |εj|4|¬L|40. (|πThe loop on |εj, 
 3010  |πsteps D2 through D7, will be essentially a 
 3018  division of |εu|βju|βj|βα+↓|β1|4.|4.|4.|4u|βj|βα+↓|βn 
 3021  |πby |εv|β1v|β2|4.|4.|4.|4v|βn |πto get a single 
 3027  quotient digit |εq|βj; |πcf. Fig. 6.)|'{A3}|≡D|≡3|≡.|9[Calcu
 3033  late |ε|=7q.] |πIf |εu|βj|4α=↓|4v|β1, |πset |ε|=7q|4|¬L|4b|4
 3038  α_↓|41; |πotherwise set |ε|=7q|4|¬L|4|"l(u|βjb|4α+↓|4u|βj|βα
 3041  +↓|β1)/v|β1|"L. |πNow test if |εv|β2|=7q|4|¬Q|4(u|βjb|4α+↓|4
 3045  u|βj|βα+↓|β1|4α_↓|4|=7qv|β1)b|4α+↓|4u|βj|βα+↓|β2; 
 3046  |πif so, decrease |ε|=7q |πby 1 and repeat this 
 3055  test. (The latter test determines at high speed 
 3063  most of the cases in which the trial value |ε|=7q 
 3073  |πis one too large, and it eliminates |εall |πcases 
 3082  where |ε|=7q |πis two too large; see exercises 
 3090  19, 20, 21.)|'{A3}|≡D|≡4|≡.|9[Multiply and subtract.] 
 3096  Replace |εu|βju|βj|βα+↓|β1|4.|4.|4.|4u|βj|βα+↓|βn 
 3098  |πby |εu|βju|βj|βα+↓|β1|4.|4.|4.|4u|βj|βα+↓|βn 
 3100  |πminus (|ε|=7q |πtimes |εv|β1v|β2|4.|4.|4.|4v|βn). 
 3104  |πThis step (analogous to steps M3 to M5 of Algorithm 
 3114  M) consists of a simple multiplication by a one-place 
 3123  number, combined with a subtraction. The digits 
 3130  |εu|βju|βj|βα+↓|β1|4.|4.|4.|4u|βj|βα↓|βn |πshould 
 3132  be kept positive; if the result of this step 
 3141  is actually negative, |εu|βju|βj|βα+↓|β1|4.|4.|4.|4u|βj|βα+↓
 3144  |βn |πwhould be left as the true value plus |εb|gn|gα+↓|g1, 
 3154  |πi.e., as the |εb'|πs complement of the true 
 3162  value, and a ``borrow'' to the left should be 
 3171  remembered.|'{A6}{H9L11}|≡F|≡i|≡g|≡. |≡7|≡.|9|4Long 
 3174  division.|;{A6}{H10L12}|≡D|≡5|≡.|9[Test remainder.] 
 3177  Set |εq|βj|4|¬L|4|=7q. |πIf the result of step 
 3184  D4 was negative, go to step D6; otherwise go 
 3193  on to step D7.|'{A3}|≡D|≡6|≡.|9[Add back.] (The 
 3200  probability that this step is necessary is very 
 3208  small, on the order of only |ε3/b, |πsee exercise 
 3217  21; test data which activates this step should 
 3225  therefore be speci_cally continued when debugging.) 
 3231  Decrease |εq|βj |πby 1, and add |ε0v|β1v|β2|4.|4.|4.|4v|βn 
 3238  |πto |εu|βju|βj|βα+↓|β1u|βj|βα+↓|β2|4.|4.|4.|4u|βj|βα+↓|βn. 
 3240  (|πA carry will occur to the left of |εu|βj, 
 3249  |πand it should be ignored since it cancels with 
 3258  the ``borrow'' which occurred in D4.)|'{A3}|≡D|≡7|≡.|9[Loop 
 3265  on |εj.] |πIncrease |εj |πby one. Now if |εj|4|¬E|4m, 
 3274  |πgo back to D3.|'{A3}|≡D|≡8|≡.|9[Unnormalize.] 
 3279  Now |εq|β0q|β1|4.|4.|4.|4q|βm |πis the desired 
 3284  quotient, and the desired remainder may be obtained 
 3292  by dividing |εu|βm|βα+↓|β1|4.|4.|4.|4u|βm|βα+↓|βn 
 3295  |πby |εd.|'{A12}{IC}!|9|4|1|1|1|πThe representation 
 3299  of Algorithm D as a |¬m|¬i|¬x program has several 
 3308  points of interest:|'{A12}|≡P|≡r|≡o|≡g|≡r|≡a|≡m 
 3312  |≡D (|εDivision of nonnegative integers).|9|4|πThe 
 3317  conventions of this program are analogous to 
 3324  Program A; rI1|4|"o|4|εi, |πrI2|4|"o|4|εj|4α_↓|4m, 
 3328  |πrI3|4|"o|4|εi|4α+↓|4j. |πSteps D1 and D8 have 
 3334  been left as exercises.|'{A12}{H9L11M33}|∂!!!|∂!!|9|∂!!!|9|∂
 3338  !!!!!!|9|∂!!!!!!!!!|∂!!!!!!!!!!!!!!!!!!!!|∂|E|;
 3339  |ε|>|*/|↔c|↔c|↔O|\|'|π|¬d|¬1|'|¬j|¬o|¬v|'|¬o|¬f|¬l|¬o|'
 3344  1|;|εD|*/|↔O|\.|9Normalize.|'>|>|¬O|4|¬O|4|¬O|'
 3349  |;|;|;|;|π(See|4exercise|425)|'>|ε|>|*/|↔c|↔L|↔m|\|'
 3357  |π|¬d|¬2|'|¬e|¬n|¬n|¬2|'|¬m|'1|;|εD|*/|↔P|\.|9Initialize|4j.|
 3361  '>|>|*/|↔c|↔M|↔c|\|'|;|π|¬s|¬t|¬z|'|¬v|'1|;Set|4|εv|β0|4|¬L|4
 3369  0,|4|πfor|4convenience|4in|4D4.|'>|>|*/|↔c|↔M|↔O|\|'
 3373  |¬d|¬3|'|¬l|¬d|¬a|'|¬u|≤%|¬m|¬,|¬2|¬(|¬1|1|1|¬.|1|1|¬5|¬)|'
 3376  |εM|4α+↓|41|;D|*/|↔L|\.|9Calculate|4|=7q.|'>|>
 3380  |*/|↔c|↔M|↔P|\|'|;|π|¬l|¬d|¬x|'|¬u|≤%|¬m|≤%|¬1|¬,|¬2|'
 3384  |εM|4α+↓|41|;|πrAX|4|¬L|4|εu|βjb|4α+↓|4u|βj|βα+↓|β1.|'
 3386  >|ε|>|*/|↔c|↔M|↔L|\|'|;|π|¬d|¬i|¬v|'|¬v|≤%|¬1|'
 3392  |εM|4α+↓|41|;|πrA|4|¬L|4|"lrAX/|εv|β1|"L.|'>|ε|>
 3396  |*/|↔c|↔M|↔M|\|'|;|π|¬j|¬o|¬v|'|¬1|¬f|'|εM|4α+↓|41|;
 3401  |πJump|4if|4quotient|4α=↓|4|εb.|'>*?|>|*/|↔c|↔M|↔C|\|'
 3405  |;|π|¬s|¬t|¬a|'|¬q|¬h|¬a|¬t|'|εM|4α+↓|41|;|=7q|4|¬L|4|πrA.|'
 3410  >|ε|>|*/|↔c|↔M|↔o|\|'|;|π|¬s|¬t|¬x|'|¬r|¬h|¬a|¬t|'
 3416  |εM|4α+↓|41|;|=7r|4|¬L|4u|βjb|4α+↓|4u|βj|βα+↓|β1|4α_↓|4|=7qv
 3417  |β1|'>|>|*/|↔c|↔M|↔p|\|'|;|π|¬j|¬m|¬p|'|¬2|¬f|'
 3424  |εM|4α+↓|41|;!!α/↓|4(u|βjb|4α+↓|4u|βj|βα+↓|β1)|πmod|4|εv|β1.
 3425  |'>|ε|>|*/|↔c|↔M|↔l|\|'|π|¬1|¬h|'|¬l|¬d|¬x|'|¬w|¬m|¬1|'
 3432  |;|πrX|4|¬L|4|εb|4α_↓|41.|'>|ε|>|*/|↔c|↔M|↔m|\|'
 3437  |;|π|¬l|¬d|¬a|'|¬u|≤%|¬m|≤%|¬1|¬,|¬2|'|;|πrA|4|¬L|4|εu|βj|βα
 3441  +↓|β1.|9(|πHere|4|εu|βj|4α=↓|4v|β1.)|'>|>|*/|↔c|↔C|↔c|\|'
 3445  |π|;|¬j|¬m|¬p|'|¬4|¬f|'|;>|>|ε|*/|↔c|↔C|↔O|\|'
 3452  |π|¬3|¬h|'|¬l|¬d|¬x|'|π|¬q|¬h|¬a|¬t|'|εE|;>*2*?*?¬q|¬h|¬a|¬t|'
 3458  |ε(N|4α+↓|41)(M|4α+↓|41)|;|πrAX|4|¬L|4|→α_↓|ε|=7qv|βi.|'
 3460  >|ε|>|*/|↔c|↔p|↔O|\|'|π|;|¬s|¬l|¬c|'|¬5|'|ε(N|4α+↓|41)(M|4α+↓
 3466  |41)|;|πInterchange|4rA|4|"m|4rX.|'>|ε|>|*/|↔c|↔p|↔P|\|'
 3471  |;|π|¬a|¬d|¬d|'|¬c|¬a|¬r|¬r|¬y|'|ε(N|4α+↓|41)(M|4α+↓|41)|;
 3475  |πAdd|4the|4contribution|4from|4the|'>|ε|>|*/|↔c|↔p|↔L|\|'
 3479  |π|;|¬j|¬n|¬o|¬v|'{J3}|≤∩|→|≤%|¬2|'|ε(N|4α↓|41)(M|4α+↓|41)|;
 3483  |π!!digit|4to|4the|4right,|4plus|41.|'>|ε|>|*/|↔c|↔p|↔M|\|'
 3487  |π|;|¬d|¬e|¬c|¬x|'|¬1|'|εK|;|πIf|4sum|4is|4|¬E|4|→α_↓|εb,|4|
 3491  πcarry|4|→α_↓1.|'>|ε|>|*/|↔c|↔p|↔p|\|'|π|;|¬a|¬d|¬d|'
 3497  |¬u|¬,|¬3|'|ε(N|4α+↓|41)(M|4α+↓|41)|;|πAdd|4|εu|βi|βα+↓|βj.|
 3499  '>|ε|>|*/|↔c|↔p|↔o|\|'|π|;|¬a|¬d|¬d|'|¬w|¬m|¬1|'
 3506  |ε(N|4α+↓|41)(M|4α+↓|41)|;|πAdd|4|εb|4α_↓|41|4|πto|4force|4α
 3507  +↓|4sign.|'>|ε|>|*/|↔c|↔p|↔p|\|'|π|;|¬j|¬n|¬o|¬v|'
 3513  {J3}|≤∩|→|≤%|¬2|'|ε(N|4α+↓|41)(M|4α⊗↓|41)|'|πIf|4no|4over⊗ow
 3515  ,|4carry|4|→α_↓1.|'>|ε|>|*/|↔c|↔p|↔l|\|'|π|;|¬i|¬n|¬c|¬x|'
 3521  |¬1|'|εK|¬S|;|πrX|4|"o|4carry|4|→α+↓1.|'>|ε|>
 3526  |*/|↔c|↔p|↔m|\|'|π|;|¬s|¬t|¬a|'|¬u|¬,|¬3|'(|εN|4α+↓|41)(M|4α+
 3530  ↓|41)|;|π|εu|βi|βα+↓|βj|4|¬L|4|πrA|4(may|4be|4minus|4zero).|
 3531  '>|ε|>|*/|↔c|↔l|↔c|\|'|π|;|¬d|¬e|¬c|¬1|'|¬1|'|ε(N|4α+↓|41)(M|
 3538  4α+↓|41)|;>|>|*/|↔c|↔l|↔O|\|'|π|;|¬d|¬e|¬c|¬3|'
 3544  |¬1|'|ε(N|4α+↓|41)(M|4α+↓|41)|;>|>|*/|↔c|↔p|↔P|\|'
 3549  |π|;|¬j|¬1|¬n|¬n|'|¬2|¬b|'|ε(N|4α+↓|41)(M|4α+↓|41)|;
 3553  |πRepeat|4for|4|εn|¬R|4i|4|¬R|40.|'>|ε|>|*/|↔c|↔l|↔L|\|'
 3557  |π|¬d|¬5|'|¬l|¬d|¬a|'|¬q|¬h|¬a|¬t|'|εM|4α+↓|41|;
 3561  |εD|*/|↔C|\.|9Test|4remainder.|'>|>|*/|↔c|↔l|↔M|\|'
 3565  |π|;|¬s|¬t|¬a|'|¬q|≤%|¬m|¬,|¬2|'|εM|4α+↓|41|;
 3569  |πSet|4|εq|βj|4|¬L|4|=7q.|'>|>|*/|↔c|↔l|↔C|\|'
 3573  |π|;|¬j|¬x|¬p|'|¬d|¬7|'|εM|4α+↓|41|;|π(Here|4rX|4α=↓|40|4or|
 3577  41,|4since|4|εv|β0|4α=↓|40.)|'>|ε|>|*/|↔c|↔l|↔o|\|'
 3581  |π|¬d|¬6|'|¬d|¬e|¬c|¬a|'|¬1|'|;|εD|*/|↔o|\.|9Add|4back.|'
 3586  >|ε|>|*/|↔c|↔l|↔p|\|'|π|;|¬s|¬t|¬a|'|¬q|≤%|¬m|¬,|¬2|'
 3592  |;|πSet|4|εq|βj|4|¬L|4|=7q|4α_↓|41.|'>|ε|>|*/|↔c|↔l|↔l|\|'
 3597  |π|;|¬e|¬n|¬t|¬1|'|¬n|'|;|εi|4|¬L|4n.|'>|ε|>|*/|↔c|↔l|↔m|\|'
 3605  |π|;|¬e|¬n|¬t|¬3|'|¬m|≤%|¬n|¬,|¬2|'|;|ε(i|4α+↓|4j)|4|¬L|4n|4
 3609  α+↓|4j.|'>|ε|>|*/|↔c|↔m|↔c|\|'|π|¬1|¬h|'|¬e|¬n|¬t|¬a|'
 3615  |¬0|'|;|π(This|4is|4essentially|4Program|4A.)|'
 3618  >|ε|>|*/|↔c|↔m|↔O|\|'|π|¬2|¬h|'|¬a|¬d|¬d|'|¬u|¬,|¬3|'
 3624  >|ε|>|*/|↔c|↔m|↔P|\|'|π|;|¬a|¬d|¬d|'|¬v|¬,|¬1|'
 3630  >|ε|>|*/|↔c|↔m|↔L|\|'|π|;|¬s|¬t|¬a|'|¬u|¬,|¬3|'
 3636  >|ε|>|*/|↔c|↔m|↔M|\|'|π|;|¬d|¬e|¬c|¬1|'|¬1|'>|ε|>
 3644  |*/|↔c|↔m|↔C|\|'|π|;|¬d|¬e|¬c|¬3|'|¬1|'>|ε|>|*/|↔c|↔m|↔o|\|'
 3651  |π|;|¬j|¬n|¬o|¬v|'|¬1|¬b|'>|ε|>|*/|↔c|↔m|↔p|\|'
 3657  |π|;|¬e|¬n|¬t|¬a|'|¬1|'>|ε|>|*/|↔c|↔m|↔l|\|'|π|;
 3664  |¬j|¬1|¬p|'|¬2|¬b|'|;|π(Not|4necessary|4to|4add|4to|4|εu|βj.
 3667  )|'>|ε|>|*/|↔c|↔m|↔m|\|'|π|¬d|¬7|'|¬i|¬n|¬c|¬2|'
 3673  |¬1|'|εM|4α+↓|41|;|εD|*/|↔p|\.|9Loop|4on|4j.|'
 3676  >|ε|>|*/|↔O|↔c|↔c|\|'|;|π|¬j|¬2|¬n|¬p|'|¬d|¬3|'
 3682  |εM|4α+↓|41|;|πRepeat|4for|40|4|¬E|4|εj|4|¬E|4m.|'
 3684  >|ε|>|*/|↔O|↔c|↔O|\|'|π|¬d|¬8|'|¬O|4|¬O|4|¬O|'
 3689  |;|;(See|4exercise|426)|'>|Hβ{U0}{H9L11M29}|πW58320#Computer
folio 354 galley 4
 3693   Programming!(Knuth/Addision-Wesley)!f.354!Ch.4!g.4b.|'
 3695  {A20}{H10L12M29}!|9|4|1|1|1Note how easily the 
 3699  rather complex appearing calculations and decisions 
 3705  of step D3 can be handled inside the machine. 
 3714  Note also that the program for step D4 is analogous 
 3724  to Program M, except that the ideas of Program 
 3733  S have also been incorporated. In step D6, use 
 3742  has been made of the fact that |εv|β0|4α=↓|40, 
 3750  |πand that |εu|βj |πis not needed in the subsequent 
 3759  calculations; a strict interpretation of Algorithm 
 3765  D would require line 098 to be ``|¬j|¬i|¬n|¬n 
 3773  |¬2|¬b.''|'!|9|4|1|1|1The running time for Program 
 3779  D can be estimated by considering the quantities 
 3787  |εM, N, E, K, |πand |εK|¬S |πshown in the program. 
 3797  (These quantities ignore several situations which 
 3803  can only occur with very low probability; for 
 3811  example, we may assume that lines 048<050, 063<064, 
 3819  and step D6 are never executed.) Here |εM|4α+↓|41 
 3827  |πis the number of words in the quotient; |εN 
 3836  |πis the number of words in the divisor; |εE 
 3845  |πis the number of times |ε|=7q |πis adjusted 
 3853  downwards in step D3; |εK |πand |εK|¬S |πare 
 3861  the number of times certain ``carry'' adjustments 
 3868  are made during the multiply-subtract loop. If 
 3875  we assume that |εK|4α+↓|4K|¬S |πis approximately 
 3881  |ε(N|4α+↓|41)(M|4α+↓|41), |πand that |εE |πis 
 3886  approximately |f1|d32|)|εM, |πwe get a total 
 3892  running time of approximately|'{A9}|ε30MN|4α+↓|430N|4α+↓|489
 3896  M|4α+↓|4111|;{A9}|πcycles, plus 67|εN|4α+↓|4235M|4α+↓|44 
 3900  |πmore if |εd|4|¬Q|41. (|πThe program segments 
 3906  of exercises 25 and 26 are included in these 
 3915  totals.) When |εM |πand |εN |πare large, this 
 3923  is only about seven percent longer than the time 
 3932  Program M takes to multiply the quotient by the 
 3941  divisor.|'!|9|4|1|1|1Further commentary on Algorithm 
 3946  D appears in the exercises at the close of this 
 3956  section.|'{A12}!|9|4|1|1|1It is possible to debug 
 3962  programs for multiple-precision arithmetic by 
 3967  using the multiplication and addition routines 
 3973  to check the result of the division routine, 
 3981  etc. The following type of test data is occasionally 
 3990  useful:|'{A9}|ε(t|gm|4α_↓|41)(t|gn|4α_↓|41)|4α=↓|4t|gm|gα+↓|
 3991  gn|4α_↓|4t|gn|4α_↓|4t|gm|4α+↓|41.|;{A9}|πIf |εm|4|¬W|4n, 
 3994  |πthis number has the radix |εt |πexpansion|'
 4001  {A9}|ε|((t|4α_↓|41)!|¬O|4|¬O|4|¬O!(t|4α_↓|41)|d5m|4α_↓|41|4|
 4001  πplaces|)|ε!|((t|4α+↓|42)|d5!|)!|((t|4α_↓|41)!|¬O|4|¬O|4|¬O!
 4001  (t|4α_↓|41)|d5n|4α_↓|4m|4|πplaces|)!|(0!|¬O|4|¬O|4|¬O!0!1;|d
 4001  5|εm|4α_↓|41|4|πplaces|)|;{A9}|πfor example, 
 4004  (10|g3|4α_↓|41)(10|g5|4α_↓|41)|4α=↓|499899001. 
 4005  In the case of Program D, it is also necessary 
 4015  to _nd some test cases which cause the rarely 
 4024  executed parts of the program to be used; some 
 4033  portions of that program would probably never 
 4040  get tested even if a million random test cases 
 4049  were tried.|'!|9|4|1|1|1Now that we have seen 
 4056  how to operate with signed-magnitude numbers, 
 4062  let us consider what approach should be taken 
 4070  to the same problems when a computer with complement 
 4079  notation is being used. For two's complement 
 4086  and one's complement notations, it is best to 
 4094  let the radix |εb |πbe |εone-half |πthe word 
 4102  size; thus for a 32-bit computer word we would 
 4111  use |εb|4α=↓|42|g3|g1 |πin the above algorithms. 
 4117  The sign bit of all but the most signi_cant word 
 4127  of a multiple-precision number will be zero, 
 4134  so that no anomalous sign correction takes place 
 4142  during the computer's multiplication and division 
 4148  operations. In fact, the basic meaning of complement 
 4156  notation requires that we consider all but the 
 4164  most signi_cant word to be nonnegative: For example, 
 4172  assuming a 10-bit word, the two's complement 
 4179  number|'{A9}1101111110!!111111010!!011101011|;
 4181  {A9}(where the sign is given only for the most 
 4190  signi_cant word) is properly thought of as|'{A9}|→α_↓2|g2|g7
 4197  |4α+↓|4(101111110)|β2|4|¬O|42|g1|g8|4α+↓|4(11111010)|β2|4|¬O
 4197  |42|g9|4α+↓|4(011101011)|β2.|;{A9}|π!|9|4|1|1|1Addition 
 4199  of signed numbers is slightly easier when complement 
 4207  notations are being used, since the routine for 
 4215  adding |εn-|πplace nonnegative integers can be 
 4221  used for arbitrary |εn-|πplace integers; the 
 4227  sign appears only in the _rst word, so the less 
 4237  signi_cant words may be added together irrespective 
 4244  of the actual sign. (Special attention must be 
 4252  given to the leftmost carry when ones' complement 
 4260  notation is being used, however; it must be added 
 4269  into the least signi_cant word, and possibly 
 4276  propagated further to the left.) Similarly, we 
 4283  _nd that subtraction of signed numbers is slightly 
 4291  simpler with complement notation. On the other 
 4298  hand, multiplication and division seem to be 
 4305  done most easily by working with nonnegative 
 4312  quantities and doing suitable complementation 
 4317  operations beforehand to make sure both operands 
 4324  are nonnegative; it may be possible to avoid 
 4332  this complementation by devising some tricks 
 4338  for working directly with negative numbers in 
 4345  a complement notation, and it is not hard to 
 4354  see how this could be done in double-precision 
 4362  multiplication, but care should be taken not 
 4369  to slow down the inner loops of the subroutines 
 4378  when high precision is required. Note that the 
 4386  product of two |εm-|πplace numbers in two's complement 
 4394  notation may require |ε2m|4α+↓|41 |πplaces: the 
 4400  square of |→α_↓|εb|gm |πis |εb|g2|gm.|'{A12}|π!|9|4|1|1|1Let
 4405   us now turn to an analysis of the quantity |εK 
 4416  |πthat arises in Program A, i.e., the number 
 4424  of carries that occur when |εn-|πplace numbers 
 4431  are being added together. This quantity |εK |πplays 
 4439  no part in the total running time of Program 
 4448  A, but it does a=ect the running time of the 
 4458  counterpartzs of Program A that deal with complement 
 4466  notations, and its analysis is interesting in 
 4473  itself as a signi_cant application of generating 
 4480  functions.|'!|9|4|1|1|1Suppose now that |εu |πand 
 4486  |εv |πare independent random |εn-|πplace integers 
 4492  uniformly distributed in the range 0|4|¬E|4|εu, 
 4498  v|4|¬W|4b|gn. |πLet |εp|βn|βk |πbe the probability 
 4504  that exactly |εk |πcarries occur in the addition 
 4512  of |εu |πto |εv, and |πthat one of these carries 
 4522  occurred in the most signi_cant position (so 
 4529  that |εu|4α+↓|4v|4|¬R|4b|gn). |πSimilarly, let 
 4533  |εq|βn|βk |πbe the probability that exactly |εk 
 4540  |πcarries occur, but there is no carry in the 
 4549  most signi_cant position. Then it is not hard 
 4557  to see that|'|ε{A9}p|β0|βk|4α=↓|40,!!q|β0k|4α=↓|4|≤d|β0|βk,!
 4560  !|πfor|4all|4|εk;|;{A4}p|β(|βn|βα+↓|β1|β)|β(|βk|βα+↓|β1|β)|4
 4561  |∂α=↓|4|(b|4α+↓|41|d22b|)|4p|βn|βk|4α+↓|4|(b|4α_↓|41|d22b|)|
 4561  4q|βn|βk,|J!(3)|;{A4}| q|β(|βn|βα+↓|β1|β)|βk|4|Lα=↓|4|(b|4α_
 4562  ↓|41|d22b|)|4p|βn|βk|4α+↓|4|(b|4α+↓|41|d22b|)|4q|βn|βk;>
 4563  {A9}|πthis happens because (|εb|4α_↓|41)/2b |πis 
 4568  the probability that |εu|β1|4α+↓|4v|β1|4|¬R|4b 
 4572  |πand |ε(b|4α+↓|41)/2b |πis the probability that 
 4578  |εu|β1|4α+↓|4v|β1|4α+↓|41|4|¬R|4b, |πwhen |εu|β1 
 4581  |πand |εv|β1 |πare independently and uniformly 
 4587  distributed integers in the range 0|4|¬E|4|εu|β1,|4v|β1|4|¬W
 4592  |4b.|'|π!|9|4|1|1|1To obtain further information 
 4597  about these quantities |εp|βn|βk |πand |εq|βn|βk, 
 4603  |πwe may set up the generating functions|'{A9}|εP(z,|4t)|4α=
 4610  ↓|4|↔k|uc|)k,n|)|1|1p|βn|βkz|gkt|gn,!!Q(z,|4t)|4α=↓|4|↔k|uc|
 4610  )k,n|)|1|1q|βn|βkz|gkt|gn;|J!(4)|;{A9}|πfrom 
 4612  (3) we have the basic relations|'{A9}|ε|h|εQ(z,|4t)|4|∂α=↓|4
 4618  1|4α+↓|4t|4|↔ab|4α_↓|41|4P(z,|4t)|4α+↓|4b|4α+↓|41|4Q(z,|4t)|
 4618  ↔s.|E|n|;| P(z,|4t)|4|Lα=↓|4zt|4|↔a|(b|4α+↓|41|d22b|)|4P(z,|
 4619  4t)|4α+↓|4|(b|4α_↓|41|d22b|)|4Q(z,|4t)|↔s,>{A4}| Q(z,|4t)|4|
 4620  Lα=↓|41|4α+↓|4t|4|↔a|(b|4α_↓|41|d22b|)|4P(z,|4t)|4α+↓|4|(b|4
 4620  α+↓|41|d22b|)|4Q(z,|4t)|↔s.>{A9}|πThese two equations 
 4624  are readily solved for |εP(z,|4t); |πand if we 
 4632  let|'{A9}|εG(z,|4t)|4α=↓|4P(z,|4t)|4α+↓|4Q(z,|4t)|4α=↓|4|↔k|
 4633  uc|)n|)|1|1G|βn(z)t|gn,|;{A9}|πwhere |εG|βn(z) 
 4636  |πis the generating function for the total number 
 4644  of carries when |εn-|πplace numbers are added, 
 4651  we _nd that|'{A9}|εG(z,|4t)|4α=↓|4(b|4α_↓|4zt)/p(z,|4t),!|πw
 4654  here!|εp(z,|4t)|4α=↓|4b|4α_↓|4|f1|d32|)(1|4α+↓|4b)(1|4α+↓|4z
 4654  )t|4α+↓|4zt|g2.|J!(5)|;{A9}|πNote that |εG(1,|4t)|4α=↓|41/(1
 4657  |4α_↓|4t), |πand this checks with the fact that 
 4665  |εG|βn(1) |πmust equal 1 (it is the sum of all 
 4675  the possible probabilities). Taking partial derivatives 
 4681  of (5) with respect to |εz, |πwe _nd that|'{A9}|ε|h|ε|9|g2G|
 4690  4|∂α=↓|4|↔kG|¬C(z)t|gn|4α=↓|4|→α_↓t|g2(b|4α+↓|41|4α_↓|42t)|4
 4690  α+↓|4t|g2(b|4α_↓|4zt)(b|4α+↓|41|4α_↓|42t).|E|n|;
 4691  | |(|9G|d2|9z|)|4|Lα=↓|4|↔k|uc|)n|)G|ur|↔0|)n|)(z)t|gn|4α=↓|
 4691  4|(|→α_↓t|d2p(z,|4t)|)|4α+↓|4|(t(b|4α_↓|4zt)({U0}{H9L11M29}|
folio 357 galley 5
 4691  πW58320#Computer Programming!(Knuth/Addision-Wesley)!f.357!C
 4692  h.4!g.5b.|'{A20}{H10L12M29}Now let us put |εz|4α=↓|41 
 4698  |πand expand in partial fractions:|'{A9}|ε|↔k|uc|)n|)G|ur|↔0
 4703  |)n|)(1)t|gn|4|∂α=↓|4|(t|d22|)|4|↔a|(1|d2(1|4α_↓|4t)|g2|)|4α
 4703  _↓|4|(1|d2(b|4α_↓|41)(1|4α_↓|4t)|)|4α+↓|4|(1|d2(b|4α_↓|41)(b
 4703  |4α_↓|4t)|),|'{A4}| |↔k|uc|)n|)G|ur|¬C|)n|)(1)t|gn|4|Lα=↓|4|
 4704  (t|g2|d22|)|4|↔a|(1|d2(1|4α_↓|4t)|g3|)|4α_↓|4|(1|d2(b|4α_↓|4
 4704  1)|g2(1|4α_↓|4t)|)|4α+↓|4|(1|d2(b|4α_↓|41)|g2(b|4α_↓|4t)|)>
 4705  {A4}α+↓|4|(1|d2(b|4α_↓|41)(b|4α_↓|4t)|g2|)|↔s.|?
 4706  {A9}|πIt follows that the average number of carries, 
 4714  i.e., the mean value of |εK, |πis|'{A9}|εG|ur|↔0|)n|)(1)|4α=
 4721  ↓|4|(1|d22|)|4{H12}|↔a{H10}n|4α_↓|4|(1|d2b|4α_↓|41|)|4|↔a1|4
 4721  α_↓|4|↔a|(1|d2b|)|↔s|gn|↔s{H12}|↔s{H10};|J!(6)|;
 4722  {A9}|πthe variance is|'{A9}|εG|ur|¬C|)n|)(1)|4α+↓|4G|ur|↔0|)
 4725  n|)(1)|4α_↓|4G|ur|↔0|)n|)(1)|g2|'{A4}α=↓|4|(1|d24|)|4{H12}|↔
 4726  a{H10}n|4α+↓|4|(2n|d2b|4α_↓|41|)|4α_↓|4|(2b|4α+↓|41|d2(b|4α_
 4726  ↓|41)|g2|)|4α+↓|4|(2b|4α+↓|42|d2(b|4α_↓|41)|g2|)|4|↔a|(1|d2b
 4726  |)|↔s|gn|4α_↓|4|(1|d2(b|4α_↓|41)|g2|)|4|↔a|(1|d2b|)|↔s|gn|4α
 4726  _↓|4|(1|d2(b|4α_↓|41)|g2|)|4|↔a|(1|d2b|)|↔s|g2|gn{H12}|↔s{H1
 4726  0}.!(7)|?{A9}|πSo the number of carries is just 
 4734  slightly less than |f1|d32|)|εn |πunder these 
 4740  assumptions.|'{A12}{H10L12}|≡H|≡i|≡s|≡t|≡o|≡r|≡y 
 4742  |≡a|≡n|≡d |≡B|≡i|≡b|≡l|≡i|≡o|≡g|≡r|≡a|≡p|≡h|≡y|≡.|9|4The 
 4744  early history of the classical algorithms described 
 4751  in this section is left as an interesting project 
 4760  for the reader, and only the history of their 
 4769  implementation on computers will be traced here.|'
 4776  !|9|4|1|1|1The use of 10|ε|gn |πas an assumed 
 4783  radix when multiplying large numbers on a desk 
 4791  calculator was discussed by D. N. Lehmer and 
 4799  J. P. Ballantine, |εAMN |π|≡3|≡0 (1923), 67<69.|'
 4806  !|9|4|1|1|1Double-precision arithmetic on computers 
 4810  was _rst treated by J. von Neumann and H. H. 
 4820  Goldstine [J. von Neumann, |εCollected Works 
 4826  |≡5|≡, 142<151]. |πTheorems A and B above are 
 4834  due to D. A. Pope and M. L. Stein [|εCACM |≡3 
 4845  (1960), 652<654]; |πtheir article also contains 
 4851  a bibliography of earlier work on double precision 
 4859  routines. Other ways of choosing the trial quotient 
 4867  |ε|=7q |πhave been discussed by A. G. Cox and 
 4876  H. A. Luther, |εCACM |≡4 (1961), 353 [|πdivide 
 4884  by |εv|β1|4α+↓|41 |πinstead of |εv|β1], |πand 
 4890  by M. L. Stein, |εCACM |≡7 (1964), 472<474 [|πdivide 
 4899  by |εv|β1 |πor |εv|β1|4α+↓|41 |πaccording to 
 4905  the magnitude of |εv|β2]; |πKrishnamurthy [|εCACM 
 4911  |≡8 (1965), 179<181] |πshowed that examination 
 4917  of the single-precision remainder in the latter 
 4924  method leads to an improvement over Theorem B. 
 4932  Krishnamurthy and Nadi, |εCACM |≡1|≡0 (1967), 
 4938  809<813, |πsuggested a way to replace normalization 
 4945  and unnormalization operations of Algorithm D 
 4951  by a calculation of |ε|=7q |πbased on several 
 4959  leading digits of the operands.|'!|9|4|1|1|1Several 
 4965  other methods for division have been suggested:|'
 4972  !|9|4|1|1|1(1) ``Fourier division'' [J. Fourier, 
 4977  |εAnalyse des |=1equations d|=1etermin|=1ees 
 4981  (|πParis, 1831), Sec. 2.21]. This method, which 
 4988  was often used on desk calculators, essentially 
 4995  obtains each new quotient digit by increasing 
 5002  the precision of the divisor and the dividend 
 5010  at each step. Some rather extensive tests by 
 5018  the author have indicated that this method is 
 5026  certainly inferior to the ``divide and correct'' 
 5033  technique above, but there may be some applications 
 5041  in which Fourier division is practical. See D. 
 5049  H. Lehmer, |εAMM |≡3|≡3 |π(1926), 198<206; J. 
 5056  V. Uspensky, |εTheory of Equations (|πNew York: 
 5063  McGraw-Hill, 1948), 159<164.|'!|9|4|1|1|1(2) 
 5067  ``Newton's method'' for evaluating the reciprocal 
 5073  of a number was extensively used in early computers 
 5082  when there was no single-precision division instruction. 
 5089  The idea is to _nd some initial approximation 
 5097  |εx|β0 |πto the number 1/|εv, |πthen to let |εx|βn|βα+↓|β1|4
 5105  α=↓|42x|βn|4α_↓|4vx|ur2|)n|). |πThis method converges 
 5109  rapidly to 1/|εv, |πsince |εx|βn|4α=↓|4(1|4α_↓|4|≤e)/v 
 5114  |πimplies that |εx|βn|βα+↓|β1|4α=↓|4(1|4α_↓|4|≤e|g2)/v. 
 5117  |πConvergence to third order, i.e., with |ε|≤e 
 5124  |πreplaced by |εO(|≤e|g3) |πat each step, can 
 5131  be obtained using the formula|'{A9}|ε|h|εx|βn|βα+↓|β1|4|∂α=↓
 5136  |4x|βn(1|4α+↓|4(1|4α_↓|4vx|βn)(1|4α+↓|4(1|4α_↓|4vx|βn))),|E|
 5136  n|;| x|βn|βα+↓|β1|4|Lα=↓|4x|βn|4α+↓|4x|βn(1|4α_↓|4vx|βn)|4α+
 5137  ↓|4x|βn(1|4α_↓|4vx|βn)|g2>{A4}|L|4α=↓|4x|βn{H12}({H10}1|4α+↓
 5138  |4(1|4α_↓|4vx|βn)(1|4α+↓|4(1|4α_↓|4vx|βn)){H12}){H10},>
 5139  {A9}|π{H10L12}etc.; see P. Rabinowitz, |εCACM 
 5144  |≡4 (1961), 98. |πFor calculations on extremely 
 5151  large numbers, Newton's second-order method (followed 
 5157  by multiplication by |εu) |πcan actually be considerably 
 5165  faster than Algorithm D, if we increase the precision 
 5174  of |εx|βn |πat each step and if we also use the 
 5185  fast multiplication routines of Section 4.3.3. 
 5191  (See Algorithm 4.3.3D for details.) Some related 
 5198  iterative schemes have been discussed by E. V. 
 5206  Krishnamurthy, |εIEEE Trans. |π|≡C|≡-|≡1|≡9 (1970), 
 5211  227<231.|'!|9|4|1|1|1(3) Division methods have 
 5216  also been based on the evaluation of|'{A9}|ε|(u|d2v|4α+↓|4|≤
 5223  e|)|4α=↓|4|(u|d2v|)|4{H12}|↔a{H10}1|4α_↓|4|↔a|(|≤e|d2v|)|↔s|
 5223  4α+↓|4|↔a|(|≤e|d2v|)|↔s|g2|4α_↓|4|↔a|(|≤e|d2v|)|↔s|g3|4α+↓|4
 5223  |¬O|4|¬O|4|¬O{H12}|↔s{H10}.|;{A9}|πSee H. H. 
 5227  Laughlin, |εAMM |≡3|≡7 (1930), 287<293. |πWe 
 5233  have used this idea in the double-precision case 
 5241  (Eq. 4.2.3<3).|'{A12}{H10L12}!|9|4|1|1|1Besides 
 5244  the references just cited, the following early 
 5251  articles concerning multiple-precision arithmetic 
 5255  are of interest: High-precision ⊗oating-point 
 5260  routines using ones' complement arithmetic are 
 5266  described by A. H. Stroud and D. Secrest, |εComp. 
 5275  J. |≡6 (1963), 62<66. |πExtended-precision subroutines 
 5281  for use in FORTRAN programs are described by 
 5289  B. I. Blum, |εCACM |≡8 (1965), 318<320; |πand 
 5297  for use in ALGOL by M. Tienari and V. Suokonautio, 
 5307  |εBIT |≡6 (1966), 332<338. |πArithmetic on integers 
 5314  with |εunlimited |πprecision, making use of linked 
 5321  memory allocation techniques, has been elegantly 
 5327  described by G. E. Collins, |εCACM |≡9 (1966), 
 5335  578<589. |πFor a much larger repertoire of operations, 
 5343  including logarithms and trigonometric functions, 
 5348  see R. W. Brent, |εACM Trans. Math. Software 
 5356  |π(to appear).|'!|9|4|1|1|1We have restricted 
 5361  our discussion in this section to arithmetic 
 5368  techniques for use in computer programming. There 
 5375  are many algorithms for |εhardware |πimplementation 
 5381  of arithmetic operations which are very interesting 
 5388  but which appear to be inapplicable to computer 
 5396  programs for high-precision numbers; for example, 
 5402  see G. W. Reitwiesner, ``Binary Arithmetic,'' 
 5408  |εAdvances in Computers |≡1 (|πNew York: Academic 
 5415  Press, 1960), 231<308; O. L. MacSorley, |εProc. 
 5422  IRE |≡4|≡9 (1961), 67<91; |πG. Metz, |εIRE Transactions 
 5430  |π|≡E|≡C|≡-|≡1|≡1 (1962), 76<764; H. L. Garner, 
 5436  ``Number Systems and Arithmetic,'' |εAdvances 
 5441  in Computers |≡6 (|πNew York: Academic Press, 
 5448  1965), 131<194. The minimum achievable execution 
 5454  time for hardware addition and multiplication 
 5460  operations has been investigated by S. Winograd, 
 5467  |εJACM |≡1|≡2 (1965), 277<285; |≡1|≡4 (1967), 
 5473  793<802, |πand by R. W. Floyd, |εIEEE Symp. Foundations 
 5482  Comp. |πby R. Brent, |εIEEE Trans. |π|≡C|≡-|≡1|≡9 
 5489  (1970), 758<759, |εSci. |≡1|≡6 (1975), 3<5.|'
 5495  {A24}|π|∨E|∨X|∨E|∨R|∨C|∨I|∨S|∨E|∨S|'{A12}{H9L11M29}|9|1|≡1|≡
 5496  .|9|4[|ε|*/|↔M|↔P|\] |πStudy the early history 
 5501  of the classical algorithms for arithmetic, by 
 5508  looking up the writings of, say, Sun Ts|=|≠2u, 
 5516  al-Khow|=7arizm|=7i, Fibonacci, and Robert Recorde, 
 5521  and by translating their methods as faithfully 
 5528  as possible into more precise algorithmic notation.|'
 5535  {A3}|9|1|≡2|≡.|9|4[|ε|*/|↔O|↔C|\] |πGeneralize 
 5537  Algorithm A so that it does ``column addition,'' 
 5545  i.e., obtains the sum of |εm |πnonnegative |εn-|πplace 
 5553  integers. (Assume that |εm|4|¬E|4b.)|'{A3}|π|9|1|≡3|≡.|9|4[|
 5557  ε|*/|↔P|↔O|\] |πWrite a |¬m|¬i|¬x program for 
 5563  the algorithm of exercise 2, and estimate its 
 5571  running time as a function of |εm |πand |εn.|'
 5580  {A3}|π|9|1|≡4|≡.|9|4[|ε|*/M|↔P|↔O|\] |πGive a 
 5583  formal proof of the validity of Algorithm A, 
 5591  using the method of ``inductive assertions'' 
 5597  as explained in Section 1.2.1.|'{A3}|9|1|≡5|≡.|9|4[|ε|*/|↔P|↔
 5602  O|\] |πAlgorithm A adds the two inputs by going 
 5611  from right to left, but sometimes the data is 
 5620  more readily accessible from left to right. Design 
 5628  an algorithm which produces the same answer as 
 5636  Algorithm A, but which generates the digits of 
 5644  the answer from left to right, and goes back 
 5653  to change previous values if a carry occurs to 
 5662  make a previous value incorrect. (|εNote|*/: |\|πEarly 
 5669  Hindu and Arabic manuscripts were based on addition 
 5677  from left to right in this way; the right-to-left 
 5686  addition algorithm was a re_nement due to later 
 5694  Arabic writers, perhaps because Arabic is written 
 5701  from right to left.)|'{A3}|9|1|≡6|≡.|9|4[|ε|*/|↔P|↔P|\] 
 5706  |πDesign an algorithm which adds from left to 
 5714  right (as in exercise 5), but which does not 
 5723  store a digit of the answer until this digit 
 5732  cannot possibly be a=ected by future carries; 
 5739  there is to be no changing of any answer digit 
 5749  once it has been stored. [|εHint|*/: |\|πKeep 
 5756  track of the number of consecutive (|εb|4α_↓|41)'|πs 
 5763  which have not yet been stored in the answer.] 
 5772  This sort of algorithm would be appropriate, 
 5779  for example, in a situation where the input and 
 5788  output numbers are to be read and written from 
 5797  left to riht on magnetic tapes.|'{A3}|9|1|≡7|≡.|9|4[|εM|*/|↔P
 5803  |↔o|\] |πDetermine the average number of times 
 5810  the algorithm of exercise 5 will _nd that a carry 
 5820  makes it necessary to go back and change |εk 
 5829  |πdigits of the partial answer, for |εk|4α=↓|41,2,|4.|4.|4.|
 5835  4,|4n. (|πAssume that both inputs are independently 
 5842  and uniformly distributed between 0 and |εb|gn|4α_↓|41.)|'
 5849  {A3}|π|9|1|≡8|≡.|9|4[|εM|*/|↔P|↔o|\] |πWrite a 
 5852  |¬m|¬i|¬x program for the algorithm of exercise 
 5859  5, and determine its average running time based 
 5867  on the expected number of carries as computed 
 5875  in the text.|'{A3}|9|1|≡9|≡.|9|4[|ε|*/|↔P|↔O|\] 
 5879  |πGeneralize Algorithm A to obtain an algorithm 
 5886  which adds two |εn-|πplace numbers in a |εmixed 
 5894  radix |πnumber system, with bases |εb|β0,|4b|β1,|4.|4.|4.|4(
 5899  |πfrom right to left). Thus the least signi_cant 
 5907  digits lie between 0 and |εb|β0|4α_↓|41, |πthe 
 5914  next digits lie between 0 and |εb|β1|4α_↓|41, 
 5921  |πetc.; cf. Eq. 4.1<(9).|'|Hβ*?*?{U0}{H9L11M29}|πW58320#Comput
folio 360 galley 6
 5925  er Programming!(Knuth/Addision-Wesley)!f.360!Ch.4!G.6b.|'
 5927  {A20}{H9L11M29}|≡1|≡0|≡.|9|4[|ε|*/|↔O|↔l|\] |πWould 
 5929  program S work properly if the instructions on 
 5937  lines 06 and 07 were interchanged? If the instructions 
 5946  on lines 05 and 06 were interchanged?|'{A3}|≡1|≡1|≡.|9|4[|ε|
 5953  */|↔O|↔c|\] |πDesign an algorithm which compares 
 5959  two nonnegative |εn-|πplace integers |εu|4α=↓|4u|β1u|β2|4.|4
 5963  .|4.|4u|βn |πand |εv|4α=↓|4v|β1v|β2|4.|4.|4.|4v|βn 
 5966  |πwith radix |εb, |πto determine whether |εu|4|¬W|4v, 
 5973  u|4α=↓|4v, |πor |εu|4|¬Q|4v.|'{A3}|π|≡1|≡2|≡.|9|4[|ε|*/|↔O|↔o
 5976  |\] |πAlgorithm S assumes that we know which 
 5984  of the two input operands is the larger; if this 
 5994  information is not known, we could go ahead and 
 6003  perform the subtraction anyway, and we would 
 6010  _nd that an extra ``borrow'' is still present 
 6018  at the end of the algorithm. Design another algorithm 
 6027  which could be used (if there is a ``borrow'' 
 6036  present at the end of Algorithm S) to complement 
 6045  |εw|β1w|β2|4.|4.|4.|4w|βn |πand therefore to 
 6049  abtain the absolute value of the di=erence of 
 6057  |εu |πand |εv.|'{A3}|π|≡1|≡3|≡.|9|4[|ε|*/|↔P|↔O|\] 
 6061  |πWrite a |¬m|¬i|¬x program which multiplies 
 6067  (|εu|β1u|β2|4.|4.|4.|4u|βn)|βb |πby |εv, |πwhere 
 6071  |εv |πis a single-precision number (i.e., 0|4|¬E|4|εv|4|¬W|4
 6077  b), |πproducing the answer (|εw|β0w|β1|4.|4.|4.|4w|βn)|βb. 
 6082  |πHow much running time is required?|'{A3}|π|≡1|≡4|≡.|9|4[|ε
 6088  |*/M|↔P|↔M|\] |πGive a formal proof of the validity 
 6096  of Algorithm M, using the method of ``inductive 
 6104  assertions'' as explained in Section 1.2.1.|'
 6110  {A3}|≡1|≡5|≡.|9|4[|εM|*/|↔P|↔c|\] |πIf we wish 
 6114  to form the product of two |εn-|πplace fractions, 
 6122  (|ε.u|β1u|β2|4.|4.|4.|4u|βn)|βb|4α⊗↓|4(.v|β1v|β2|4.|4.|4.|4v
 6122  |βn)|βb, |πand to obtain only an |εn-|πplace 
 6129  approximation |ε(.w|β1w|β2|4.|4.|4.|4w|βn)|βb 
 6131  |πto the result, Algorithm M could be used to 
 6140  obtain a 2|εn-|πplace answer which is then rounded 
 6148  to the desired approximation. But this involves 
 6155  about twice as much work as is necessary for 
 6164  reasonable accuracy, since the products |εu|βiv|βj 
 6170  |πfor |εi|4α+↓|4j|4|¬Q|4n|4α+↓|42 |πcontribute 
 6173  very little to the answer.|'!!|1|1Give an estimate 
 6181  of the maximum error that can occur, if these 
 6190  products |εu|βiv|βj |πfor |εi|4α+↓|4j|4|¬Q|4n|4α+↓|42 
 6194  |πare not computed during the multiplication, 
 6200  but are assumed to be zero.|'{A3} |≡1|≡6|≡.|9|4[|ε|*/|↔P|↔c|\
 6207  ] |πDesign an algorithm which divides a nonnegative 
 6215  |εn-|πplace integer |εu|β1u|β2|4.|4.|4.|4u|βn 
 6218  |πby |εv, |πwhere |εv |πis a single precision 
 6226  number (i.e., 0|4|¬W|4|εv|4|¬W|4b), |πproducing 
 6230  the quotient |εw|β1w|β2|4.|4.|4.|4w|βn |πand 
 6234  remainder |εr.|'{A3}|π|≡1|≡7|≡.|9|4[|ε|*/M|↔P|↔c|\] 
 6237  |πIn the notation of Fig. 6, assume that |εv|β1|4|¬R|4|"lb/2
 6245  |"L; |πshow that if |εu|β0|4α=↓|4v|β1, |πwe must 
 6252  have |εq|4α=↓|4b|4α_↓|41 |πor |εb|4α_↓|42.|'{A3}|π|≡1|≡8|≡.|
 6256  9|4[|εM|*/|↔P|↔c|\] |πIn the notation of Fig. 
 6262  6, show that if |εq|¬S|4α=↓|4|"l(u|β0b|4α+↓|4u|β1)/(v|β1|4α+
 6266  ↓|41)|"L, |πthen |εq|¬S|4|¬E|4q.|'{A3}|π|≡1|≡9|≡.|9|4[|εM|*/|
 6269  ↔P|↔O|\] |πIn the notation of Fig. 6, let |ε|=7q 
 6278  |πbe an approximation to |εq, |πand let |ε|=7r|4α=↓|4u|β0b|4
 6285  α+↓|4u|β1|4α_↓|4|=7qv|β1. |πAssume that |εv|β1|4|¬Q|40. 
 6289  |πShow that if |εv|β2|=7q|4|¬Q|4b|=7r|4α+↓|4u|β2, 
 6293  |πthen |εq|4|¬W|4|=7q. [Hint|*/: |\|πStrengthen 
 6297  the proof of Theorem A by examining the in⊗uence 
 6306  of |εv|β2.]|'{A3}|π|≡2|≡0|≡.|9|4[|εM|*/|↔P|↔P|\] 
 6309  |πUsing the notation and assumptions of exercise 
 6316  19, show that if |εv|β2|=7q|4|¬E|4b|=7r|4α+↓|4u|β2, 
 6321  |πthen |ε|=7q|4α=↓|4q |πor |εq|4α=↓|4|=7q|4α_↓|41.|'
 6325  {A3}|π|≡2|≡1|≡.|9|4[|εM|*/|↔P|↔L|\] |πShow that 
 6328  if |εv|β1|4|¬R|4|"lb/2|"L, |πand if |εv|β2|=7q|4|¬E|4b|=7r|4
 6332  α+↓|4u|β2 |πbut |ε|=7q|4|=|↔6α=↓|4q |πin the 
 6337  notation of exercises 19 and 20, then |εu|4α_↓|4qv|4|¬R|4(1|
 6344  4α_↓|43/b)v. (|πThe latter event occurs with 
 6350  approximate probability 3/|εb, |πso that when 
 6356  |εb |πis the word size of a computer we must 
 6366  have |εq|βj|4α=↓|4|=7q |πin Algorithm D except 
 6372  in very rare circumstances.)|'{A3}|≡2|≡2|≡.|9|4[|ε|*/|↔P|↔M|\
 6376  ] |πFind an example of a four-digit number divided 
 6385  by a three-digit number, using Algorithm D when 
 6393  the radix |εb |πis 10, for which step D6 is necessary.|'
 6404  {A3}|≡2|≡3|≡.|9|4[|εM|*/|↔P|↔L|\] |πGiven that 
 6407  |εv |πand |εb |πare integers, and that 1|4|¬E|4|εv|4|¬W|4b, 
 6415  |πprove that |ε|"lb/2|"L|4|¬E|4v|"lb/(v|4α+↓|41)|"L|4|¬W|4(v
 6417  |4α+↓|41)|"lb/(v|4α+↓|41)|"L|4|¬E|4b.|'{A3}|π|≡2|≡4|≡.|9|4[|
 6418  εM|*/|↔P|↔c|\] |πUsing the law of the distribution 
 6425  of leading digits explained in Section 4.2.4, 
 6432  give an approximate formula for the probability 
 6439  that |εd|4α=↓|41 |πin Algorithm D. (When |εd|4α=↓|41, 
 6446  |πit is, of course, possible to omit most of 
 6455  the calculation in steps D1 and D8.)|'{A3}|π|≡2|≡5|≡.|9|4[|ε
 6462  |*/|↔P|↔o|\] |πWrite a |¬m|¬i|¬x routine for step 
 6469  D1, which is needed to complete Program D.|'{A3}|≡2|≡6|≡.|9|
 6477  4[|ε|*/|↔P|↔O|\] |πWrite a |¬m|¬i|¬x routine for 
 6483  step D8, which is needed to complete Program 
 6491  D.|'{A3}|≡2|≡7|≡.|9|4[|εM|*/|↔P|↔c|\] |πProve 
 6494  that at the beginning of step D8 in Algorithm 
 6503  D, the number |εu|βm|βα+↓|β1u|βm|βα+↓|β2|4.|4.|4.|4u|βm|βα+↓
 6506  |βn |πis always an exact multiple of |εd.|'{A3}|π|≡2|≡8|≡.|9
 6514  |4[|εM|*/|↔L|↔c|\] |π(A. Svoboda, |εStroje na 
 6519  Zpracov|=1an|=1i Informac|=1i |≡9 (1963), 25<32.) 
 6524  |πLet |εv|4α=↓|4(v|β1v|β2|4.|4.|4.|4v|βn)|βb 
 6526  |πbe any radix |εb |πinteger, where |εv|β1|4|=|↔6α=↓|40. 
 6533  |πPerforem the following operations:|'{A3}{I1.6H}|≡N|≡1|≡.|9
 6537  If |εv|β1|4|¬W|4b/2, |πmultiply |εv |πby |"l(|εb|4α+↓|41)/(v
 6542  |β1|4α+↓|41)|¬L. |πLet the result of this step 
 6549  be |ε(v|β0v|β1v|β2|4.|4.|4.|4v|βn)|βb.|'{A3}|π|≡N|≡2|≡.|9If 
 6552  |εv|β0|4α=↓|40, |πset |εv|4|¬L|4v|4α+↓|4(1/b)|"lb(b|4α_↓|4v|
 6554  β1)/(v|β1|4α+↓|41)|"Lv; |πlet the result of this 
 6560  step be (|εv|β0v|β1v|β2|4.|4.|4.|4v|βn.v|βn|βα+↓|β1|4.|4.|4.
 6562  )|βb. |πRepeat step N2 until |εv|β0|4|=|↔6α=↓|40.|'
 6568  {A3}|π{IC}Prove that step N2 will be performed 
 6575  at most three times, and that we must always 
 6584  have |εv|β0|4α=↓|41, v|β1|4α=↓|40 |πat the end 
 6590  of the calculations.|'!!|1|1[|εNote|*/: |\|πIf 
 6595  |εu |πand |εv |πare both multiplied by the above 
 6604  constants, we do not change the value of the 
 6613  quotient |εu/v, |πand the divisor has been converted 
 6621  into the form (10|εv|β2|4.|4.|4.|4v|βn.v|βn|βα+↓|β1v|βn|βα+↓
 6624  |β2v|βn|βα+↓|β3)|βb. |πThis form of the divisor 
 6630  may be very convenient because, in the notation 
 6638  of Algorithm D, we may simply take |ε|=7q|4α=↓|4u|βj 
 6646  |πas a trial divisor at the beginning of step 
 6655  D3, or |ε|=7q|4α=↓|4b|4α_↓|41 |πwhen |ε(u|βj|βα_↓|β1,|4u|βj)
 6659  |4α=↓|4(1,|40).]|'{A3}|π|≡2|≡9|≡.|9|4[|ε|*/|↔O|↔C|\] 
 6661  |πProve or disprove: At the beginning of step 
 6669  D7 of Algorithm D, we always have |εu|βj|4α=↓|40.|'
 6677  {A3}|π|≡3|≡0|≡.|9|4[|ε|*/|↔P|↔P|\] |πIf memory 
 6680  space is limited, it may be desirable to use 
 6689  the same storage locations for both input and 
 6697  output during the performance of some of the 
 6705  algorithms in this section. Is it possible to 
 6713  have |εw|β1,|4.|4.|4.|4,|4w|βn |πstored in the 
 6718  same respective locations as |εu|β1,|4.|4.|4.|4u|βn 
 6723  |πor |εv|β1,|4.|4.|4.|4, v|βn |πduring Algorithm 
 6728  A or S? Is it possible to have |εq|β0,|4.|4.|4.|4,|4q|βm 
 6737  |πoccupy the same locations as |εu|β0,|4.|4.|4.|4,|4u|βm 
 6743  |πin Algorithm D? Is there any permissible overlap 
 6751  of memory locations between input and output 
 6758  in Algorithm M?|'{A3}|≡3|≡1|≡.|9|4[|ε|*/|↔P|↔l|\] 
 6762  |πAssume that |εb|4α=↓|43 |πand that |εu|4α=↓|4(u|β1|4.|4.|4
 6767  .|4u|βm|βα+↓|βn)|β3, v|4α=↓|4(v|β1|4.|4.|4.|4v|βn)|β3 
 6769  |πare integers in |εbalanced ternary |πnotation 
 6775  (cf. Section 4.1), |εv|β1|4|=|↔6α=↓|40. |πDesign 
 6780  a long-division algorithm which divides |εu |πby 
 6787  |εv, |πobtaining a remainder whose absolute value 
 6794  does not exceed |f1|d32|)|4|¬G|εv|¬G. |πTry to 
 6800  _nd an algorithm which would be e∃cient if incorporated 
 6809  into the arithmetic circuitry of a balanced ternary 
 6817  computer.|'{A3}|≡3|≡2|≡.|9|4[|εM|*/|↔M|↔c|\] |πAssume 
 6820  that |εb|4α=↓|42i |πand that |εu |πand |εv |πare 
 6828  complex numbers expressed in the quarter-imaginary 
 6834  number system. Design algorithms which divide 
 6840  |εu |πby |εv, |πperhaps obtaining a suitable 
 6847  remainder of some sort, and compare their e∃ciency. 
 6855  |εReferences|*/: |\|πM. Nadler, |εCACM |≡4 (1961), 
 6861  192<193; |πZ. Pawlak and A. Wakulicz, |εBull. 
 6868  de l'Acad. Polonaise des Sciences, |πClasse III, 
 6875  |≡5 (1957), 233<236 (see also pp. 803<804); and 
 6883  exercise 4.1<15.|'{A3}|π|≡3|≡3|≡.|9|4[|εM|*/|↔M|↔c|\] 
 6886  |πDesign an algorithm for taking square roots, 
 6893  analogous to Algorithm D and to the pencil-and-paper 
 6901  method for extracting square roots.|'{A3}|≡3|≡4|≡.|9|4[|ε|*/|
 6906  ↔M|↔c|\] |πDevelop a set of computer subroutines 
 6913  for doing the four arithmetic operations on ajrbbbb*?*?*?'{A3}|
 6920  ≡3|≡4|≡.|9|4[|ε|*/|↔M|↔c|\] |πDevelop a set of 
 6925  computer subroutines for doing the four arithmetic 
 6932  operations on arbitrary integers, putting no 
 6938  constraint on the size of the integers except 
 6946  for the implicit assumption that the total memory 
 6954  capacity of the computer should not be exceeded. 
 6962  (Use linked memory allocation, so that no time 
 6970  is wasted in _nding room to put the results.)|'
 6979  {A3}|≡3|≡5|≡.|9|4[|ε|*/|↔M|↔c|\] |πDevelop a set 
 6983  of computer subroutines for ``decuple-precision 
 6988  ⊗oating-point'' arithmetic, using excess 0, base 
 6994  |εb, |πnine-place ⊗oating-point number representation, 
 6999  where |εb |πis the computer word size, and allowing 
 7008  a full word for the exponent. (Thus each ⊗oating-point 
 7017  number is represented in 10 words of memory, 
 7025  and all scaling is done by moving full words 
 7034  instead of shifting within the words.)|'{A3}|≡3|≡6|≡.|9|4[|ε
 7040  M|*/|↔M|↔P|\] |πCompute the values of the fundamental 
 7047  constants listed in Appendix B to much higher 
 7055  precision than the 40-place values listed there. 
 7062  (|εNote|*/: |π|\The _rst 100,000 digits of the 
 7069  decimal expansion of |ε|≤p |πwere published by 
 7076  D. Shanks and J. W. Wrench, Jr., in |εMath. Comp. 
 7086  |≡1|≡6 (1962), 76<99.)|'{A18}{H10L12M29}|π|∨α/↓|∨4|∨.|∨3|∨.|
 7089  ∨2|∨.|9|∨M|∨o|∨d|∨u|∨l|∨a|∨r |∨A|∨r|∨i|∨t|∨h|∨m|∨e|∨t|∨i|∨c|
 7090  '{A6}{H10L12M29}Another interesting alternative 
 7094  is available for doing arithmetic on large integer 
 7102  numbers, based on some simple principles of number 
 7110  theory. The idea is to have several ``moduli'' 
 7118  |εm|β1,|4m|β2,|4.|4.|4.|4,|4m|βr |πwhich contain 
 7121  no common factors, and to work indirectly with 
 7129  ``residues'' |εu |πmod |εm|β1, u |πmod |εm|β2|4.|4.|4.|4,|4u
 7135   |πmod |εm|βr |πinstead of directly with the 
 7143  number |εu.|'!|9|4|1|1|1|πFor convenience in 
 7148  notation throughout this section, let|'{A9}|εu|β1|4α=↓|4u|4|
 7153  πmod|4|εm|β1,!!u|β2|4α=↓|4u|4|πmod|4|εm|β2,!!.|4.|4.|4,!!u|β
 7153  r|4α=↓|4u|4|πmod|4|εm|βr.|J!(1)|;{A9}|πIt is 
 7156  easy to compute |ε(u|β1,|4u|β2,|4.|4.|4.|4,|4u|βr) 
 7160  |πfrom an integer number |εu |πby means of division; 
 7169  and#more important#no information is lost in 
 7175  this process, since we can always recompute |εu 
 7183  |πfrom (|εu|β1,|4u|β2,|4.|4.|4.|4,|4u|βr) |πprovided 
 7186  that we know |εu |πis not too large. For example, 
 7196  if 0|4|¬E|4|εu|4|¬W|4v|4|¬E|41000, |πit is impossible 
 7201  to have (|εu|4|πmod 7, |εu |πmod 11, |εu |πmod 
 7210  13) equal to (|εv |πmod 7, |εv |πmod 11, |εv 
 7220  |πmod 13). This is a consequence of the ``Chinese 
 7229  Remainder Theorem'' stated below.|'!|9|4|1|1|1Therefore 
 7234  we may regard (|εu|β1,|4u|β2,|4.|4.|4.|4,|4u|βr) 
 7238  |πas a new type of internal computer representation, 
 7246  a ``modular representation,'' of the integer 
 7252  |εu.|'|π!|9|4|1|1|1The advantages of a modular 
 7258  representation are that addition, subtraction, 
 7263  and multiplication are very simple:|'{A9}|ε(u|β1,|4.|4.|4.|4
 7268  ,|4u|βr)|4α+↓|4(v|β1,|4.|4.|4.|4v|βr)|4α=↓|4{H12}({H10}(u|β1
 7268  |4α+↓|4v|β1)|πmod|4|εm|β1,|4.|4.|4.|4,|4(u|βr|4α+↓|4v|βr)|πm
 7268  od|4|εm|βr{H12}){H10},|J!(2)|;{A4}(u|β1,|4.|4.|4.|4,|4u|βr)|
 7269  4α_↓|4(v|β1,|4.|4.|4.|4,|4v|βr)|4α=↓|4{H12}({H10}(u|β1|4α_↓|
 7269  4v|β1)|πmod|4|εm|β1,|4.|4.|4.|4,|4(u|βr|4α⊗↓|4v|βr)|πmod|4|ε
 7269  m|βr{H12}){H10},|J!(3)|;{A4}(u|β1,|4.|4.|4.|4,|4u|βr)|4α⊗↓|4
 7270  (v|β1,|4.|4.|4.|4,|4v|βr)|4α=↓|4{H12}({H10}(u|β1|4α⊗↓|4v|β1)
 7270  |πmod|4|εm|β1,|4.|4.|4.|4,|4(u|βr|4α⊗↓|4v|βr)|πmod|4|εm|βr{H
 7270  12}){H10}.|J!(4)|;{A9}|πIt is easy to prove these 
 7277  formulas; for example, to prove (4) we must show 
 7286  that |εuv |πmod |εm|βj|4α=↓|4(u|4|πmod|4|εm|βj)(v|4|πmod|4|ε
 7289  m|βj)|πmod|4|εm|βj |πfor each modulus |εm|βj. 
 7294  |πBut this is a basic fact of elementary number 
 7303  theory: |εx |πmod |εm|βj|4α=↓|4y |πmod |εm|βj 
 7309  |πif and only if |εx|4|"o|4y (|πmodulo |εm|βj); 
 7316  |πfurthermore if |εx|4|"o|4x|¬S |πand |εy|4|"o|4y|¬S, 
 7321  |πthen |εxy|4|"o|4x|¬Sy|¬S (|πmodulo |εm|βj); 
 7325  |πhence (|εu |πmod |εm|βj)(v |πmod |ε{U0}{H9L11M29}|πW58320#
folio 363 galley 7
 7330  Computer Programming!(Knuth/Addision-Wesley)!f.363!Ch.4!g.7b
 7331  .|'{A20}{H10L12M29}!|9|4|1|1|1The disadvantages 
 7334  of a modular representation are that it is comparatively 
 7343  di∃cult to test whether a number is positive 
 7351  or negative or to test whether or not (|εu|β1,|4.|4.|4.|4,|4
 7359  u|βr) |πis greater than (|εv|β1,|4.|4.|4.|4v|βr). 
 7364  |πIt is also di∃cult to test whether or not over⊗ow 
 7374  has occurred as the result of an addition, subtraction, 
 7383  or multiplication, and it is even more di∃cult 
 7391  to perform division. When these operations are 
 7398  required frequently in conjunction with addition, 
 7404  subtraction, and multiplication, the use of modular 
 7411  arithmetic can be justi_ed only if fast means 
 7419  of conversion into and out of the modular representation 
 7428  are available. Therefore conversion between modular 
 7434  and positional notation is one of the principal 
 7442  topics of interest to us in this section.|'!|9|4|1|1|1The 
 7451  processes of addition, subtraction, and multiplication 
 7457  using (2), (3), and (4) are called residue arithmetic 
 7466  or |εmodular arithmetic. |πThe range of numbers 
 7473  that can be handled by modular arithmetic is 
 7481  equal to |εm|4α=↓|4m|β1m|β2|4.|4.|4.|4m|βr, |πthe 
 7485  product of the moduli. Therefore we see that 
 7493  the amount of time required to add, subtract, 
 7501  or multiply |εn-|πdigit numbers using modular 
 7507  arithmetic is essentially proportional to |εn 
 7513  (|πnot counting the time to convert in and out 
 7522  of modular representation). This is no advantage 
 7529  at all when addition and subtraction are considered, 
 7537  but it can be a considerable advantage with respect 
 7546  to multiplication since the conventional method 
 7552  of the preceding section requires an execution 
 7559  time proportional to |εn|g2.|'|π!|9|4|1|1|1Moreover, 
 7564  on a computer which allows many operations to 
 7572  take place simultaneously, modular arithmetic 
 7577  can be a signi_cant advantage even for addition 
 7585  and subtraction; the operations with respect 
 7591  to di=erent moduli can all be done at the ssme 
 7601  time, so we obtain a substantial increase in 
 7609  speed. The same kind of decrease in execution 
 7617  time could not be achieved by the conventional 
 7625  techniques discussed in the previous section, 
 7631  since carry propagation must be considered. Perhaps 
 7638  some day highly parallel computers will make 
 7645  simultaneous operations commonplace, so that 
 7650  modular arithmetic will be of signi_cant importance 
 7657  in ``real-time'' calculations when a quick answer 
 7664  to a single problem requiring high precision 
 7671  is needed. (With highly parallel computers, it 
 7678  is often preferable to run |εk separate |πprograms 
 7686  simultaneously, instead of running a |εsingle 
 7692  |πprogram |εk |πtimes as fast, since the latter 
 7700  alternative is more complicated but does not 
 7707  utilize the machine any more e∃ciently; ``real-time'' 
 7714  calculations are exceptions which make the inherent 
 7721  parallelism of modular arithmetic more signi_cant.)|'
 7727  !|9|4|1|1|1Now let us examine the basic fact 
 7734  which underlies the modular representation of 
 7740  numbers:|'{A12}|≡T|≡h|≡e|≡o|≡r|≡e|≡m |≡C (|εChinese 
 7744  Remainder Theorem).|9|4Let m|β1, m|β2,|4.|4.|4.|4,|4m|βr 
 7748  be positive integers which are relatively prime 
 7755  in pairs, i.e.,|'{A9}|πgcd(|εm|βj,|4m|βk)|4α=↓|41!!|πwhen!!|
 7758  εj|4|=|↔6α=↓|4k.|J!(5)|;{A9}Let m|4α=↓|4m|β1m|β2|4.|4.|4.|4m
 7760  |βr, and let a, u|β1, u|β2,|4.|4.|4.|4,|4u|βr 
 7766  be integers. Then there is exactly one integer 
 7774  u which satis⊂es the conditions|'{A9}|εa|4|¬E|4u|4|¬W|4a|4α+
 7779  ↓|4m,!!|πand!!|εu|4|"o|4u|βj!(|πmodulo|4|εm|βj)!!|πfor!!1|4|
 7779  ¬E|4|εj|4|¬E|4r.|J!(6)|;{A9}|π|ε|εProof.|9|4|πIf 
 7781  |εu|4|"o|4v (|πmodulo|4|εm|βj) |πfor |ε1|4|¬E|4j|4|¬E|4r, 
 7785  |πthen |εu|4α_↓|4v |πis a multiple of |εm|βj 
 7792  |πfor all |εj, |πso (5) implies that |εu|4α_↓|4v 
 7800  |πis a multiple of |εm|4α=↓|4m|β1m|β2|4.|4.|4.|4m|βr. 
 7805  |πThis argument shows that there is |εat most 
 7813  |πone solution of (6). To complete the proof 
 7821  we must only show the existence of |εat least 
 7830  |πone solution, and this can be done in two simple 
 7840  ways:|'{A12}METHOD 1 (``Nonconstructive'' proof).|9|4As 
 7845  |εu |πruns through the |εm |πdistinct values 
 7852  |εa|4|¬E|4u|4|¬W|4a|4α+↓|4m, |πthe |εr-|πtuples 
 7855  (|εu|4|πmod|4|εm|β1,|4.|4.|4.|4,|4u |πmod |εm|βr) 
 7858  |πmust also run through |εm |πdistinct values, 
 7865  since (6) has at most one solution. But there 
 7874  are exactly |εm|β1m|β2|4.|4.|4.|4m|βr |πpossible 
 7878  |εr-|πtuples (|εv|β1,|4.|4.|4.|4,|4v|βr) |πsuch 
 7881  that 0|4|¬E|4|εv|βj|4|¬W|4m|βj. |πTherefore each 
 7885  |εr-|πtuple must occur exactly once, and there 
 7892  must be some value of |εu |πfor which (|εu|4|πmod 
 7901  |εm|β1,|4.|4.|4.|4,|4u |πmod |εm|βr)|4α=↓|4(u|β1,|4.|4.|4.|4
 7903  ,|4u|βr).|'{A12}|πMETHOD 2 (``Consyructive'' 
 7907  proof).|9|4We can _nd numbers |εM|βj, 1|4|¬E|4j|4|¬E|4r, 
 7913  |πsuch that|'{A9}|εM|βj|4|"o|41|4(|πmodulo|4|εm|βj|4|"o|40|4
 7915  (|πmodulo|4|εm|βk)!!|πfor!!|εk|4|=|↔6α=↓|4j.|J!(7)|;
 7916  {A9}|πThis follows because (5) implies that |εm|βj 
 7923  |πand |εm/m|βj |πare relatively prime, so we 
 7930  may take|'{A9}|εM|βj|4α=↓|4(m/m|βj)|g|≤'|g(|gm|rj|g)|J!(8)|;
 7933  {A9}|πby Euler's theorem (exercise 1.2.4<28). 
 7938  Now the number|'{A9}|εu|4α=↓|4a|4α+↓|4{H12}({H10}(u|β1M|β1|4
 7941  α+↓|4u|β2M|β2|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4u|βrM|βr|4α_↓|4a)|πm
 7941  od|4|εm{H12}){H10}|J!(9)|;{A9}|πsatis_es all 
 7944  the conditions of (6).|'{A12}!|9|4|1|1|1A very 
 7950  special case of this theorem was stated by the 
 7959  Chinese mathematician Sun-Ts|=|≠2u, who gave 
 7964  a rule called t|=1ai-yen (``great generalization''); 
 7970  the date of his writing is very uncertain, it 
 7979  is thought to be between 280 and 473 {H7}A.D.{H10} 
 7988  [See Joseph Needham, |εScience and Civilization 
 7994  in China |≡3 (|πCambridge University Press, 1959), 
 8001  33<34, for an interesting discussion.] Theorem 
 8007  C was apparently _rst stated and proved in its 
 8016  proper generality by Chhin Chiu-Shao in his |εShu 
 8024  Shu Chiu Chang (1247). |πNumerous early contributions 
 8031  to this theory have been summarized by L. E. 
 8040  Dickson in his |εHistory of the Theory of Numbers 
 8049  |≡2 (|πNew York: Chelsea, 1952), 57<64.|'!|9|4|1|1|1As 
 8056  a consequence of Theorem C, we may use modular 
 8065  representation for numbers in any consecutive 
 8071  interval of |εm|4α=↓|4m|β1m|β2|4.|4.|4.|4m|βr 
 8074  |πintegers. For example, we could take |εa|4α=↓|40 
 8081  |πin (6), and work only with nonnegative integers 
 8089  |εu |πless than |εm. |πOn the other hand, when 
 8098  addition and subtraction are being done, as well 
 8106  as multiplication, it is usually most convenient 
 8113  to assume that all the moduli |εm|β1,|4m|β2,|4.|4.|4.|4,|4m|
 8119  βr |πare odd numbers, so that |εm|4α=↓|4m|β1m|β2|4.|4.|4.|4m
 8125  |βr |πis odd, and to work with integers in the 
 8135  range|'{A9}|εα_↓|4|(m|d22|)|4|¬W|4u|4|¬W|4|(m|d22|),|J!(10)|
 8136  ;{A9}|πwhich is completely symmetrical about 
 8142  zero.|'!|9|4|1|1|1To perform the basic operations 
 8148  indicated in (2), (3), and (4), we need to compute 
 8158  |ε(u|βj|4α+↓|4v|βj)|πmod |εm|βj, (u|βj|4α_↓|4v|βj)|πmod 
 8161  |εm|βj, |πand |εu|βjv|βj |πmod |εm|βj, |πwhen 
 8167  0|4|¬E|4|εu|βj,|4v|βj|4|¬W|4m|βj. |πIf |εm|βj 
 8170  |πis a single-precision number, it is most convenient 
 8178  to form |εu|βjv|βj |πmod |εm|βj |πby doing a 
 8186  multiplication and then a division operation. 
 8192  For addition and subtraction, the situation is 
 8199  a little simpler, since no division is necessary; 
 8207  the following formulas may conveniently be used;|'
 8214  {A9}|ε(u|βj|4α+↓|4v|βj)|πmod|4|εm|βj|4|∂α=↓|4|↔A|(u|βj|4α+↓|
 8214  4v|βj,!!|9|4|1|1|d5u|βj|4α+↓|4v|βj|4α_↓|4m|βj,|)!!|π|(if!!|ε
 8214  u|βj|4α+↓|4v|βj|4|¬W|4m|βj;|d5|πif!!|εu|βj|4α+↓|4v|βj|4α+↓|4
 8214  v|βj|4|¬R|4m|βj.|)|J!(11)|;{A4}| (u|βj|4α_↓|4v|βj)|πmod|4|εm
 8215  |βj|4|Lα=↓|4|↔A|(u|βj|4α_↓|4v|βj,!!|9|4|1|1|d5u|βj|4α_↓|4v|β
 8215  j|4α+↓|4m|βj,|)!!|π|(if!!|εu|βj|4α_↓|4v|βj|4|¬R|40;|d5|πif!!
 8215  |εu|βj|4α_↓|4v|βj|4|¬W|40.|)|J!(12)>{A9}|π{H10L12}(Cf. 
 8217  Section 3.2.1.1.) In this case, since we want 
 8225  |εm |πto be as large as possible, it is easiest 
 8235  to let |εm|β1 |πbe the largest odd number that 
 8244  _ts in a computer word, to let |εm|β2 |πbe the 
 8254  largest odd number|4|¬W|4|εm|β1 |πthat is relatively 
 8260  prime to |εm|β1, |πto let |εm|β3 |πbe the largest 
 8269  odd number|4|¬W|4|εm|β2 |πthat is relatively 
 8274  prime to both |εm|β1 |πand |εm|β2, |πand so on 
 8283  until enough |εm|βj'|πs have been found to give 
 8291  the desired range |εm. |πE∃cient ways to determine 
 8299  whether or not two integers are relatively prime 
 8307  are discussed in Section 4.5.2.|'!|9|4|1|1|1As 
 8313  a simple example, suppose that we have a decimal 
 8322  computer with a word size of only 100. Then the 
 8332  procedure described in the previous paragraph 
 8338  would give|'{A9}|εm|β1|4α=↓|499,!m|β2|4α=↓|497,!m|β3|4α=↓|49
 8340  5,!m|β4|4α=↓|491,!m|β5|4α=↓|489,!m|β6|4α=↓|483,|J!(13)|;
 8341  {A9}|πand so on.|'!|9|4|1|1|1On binary computers 
 8347  it is sometimes desirable to choose the |εm|βj 
 8355  |πin a di=erent way, by selecting|'{A9}|εm|βj|4α=↓|42|ge|rj|
 8361  4α_↓|41.|J!(14)|;{A9}|πIn other words, each modulus 
 8367  is one less than a power of 2. Such a choice 
 8378  of |εm|βj |πoften makes the basic arithmetic 
 8385  operations simpler, because it is relatively 
 8391  easy to work modulo 2|ε|ge|rj|4α_↓|41, |πas in 
 8398  ones' complement arithmetic. When the moduli 
 8404  are chosen according to this strategy, it is 
 8412  helpful to relax the condition |ε0|4|¬E|4u|βj|4|¬W|4m|βj 
 8418  |πslightly, so that we require only|'{A9}|ε0|4|¬E|4u|βj|4|¬W
 8424  |42|ge|rj,!!u|βj|4|"o|4u{U0}{H9L11M29}|πW58320#Computer 
folio 366 galley 8
 8425  programming!(Knuth/Addision-Wesley)!f.366!Ch.4.!G.8b.|'
 8426  {A20}{H10L12M29}|πThus, the value |εu|βj|4α=↓|4m|βj|4α=↓|42|
 8429  ge|rj|4α_↓|41 |πis allowed as an optional alternative 
 8436  to |εu|βj|4α=↓|40, |πsince this does not a=ect 
 8443  the validity of Theorem C, and it means we are 
 8453  allowing |εu|βj |πto be any |εe|βj-|πbit binary 
 8460  number. Under this assumption, the operations 
 8466  of addition and multiplication modulo |εm|βj 
 8472  |πbecome the following:|'{A9}|εu|βj|4|↔V|4v|βj|4|∂α=↓|4|↔A|(
 8475  u|βj|4α+↓|4v|βj,!!!!|d5{H12}({H10}(u|βj|4α+↓|4v|βj)|πmod|42|
 8475  ε|ge|rj{H12}){H10}|4α+↓|41,|)!!|π|(if!!|εu|βj|4α+↓|4v|βj|4|¬
 8475  W|42|ge|rj;|d5|πif!!|εu|βj|4α+↓|4v|βj|4|¬R|42|ge|rj.|)|J!(16
 8475  )|;{A4}| u|βj|4|↔N|4v|βj|4|Lα=↓|4(u|βjv|βj|4|πmod|42|ε|ge|rj
 8476  )|4|↔V|4|"lu|βjv|βj/2|ge|rj|"L.|J!(17)>{A9}|π[Here 
 8478  |↔V and |↔N refer to the operations to be done 
 8488  on the individual components of |ε(u|β1,|4.|4.|4.|4,|4u|βr) 
 8494  |πand |ε(v|β1,|4.|4.|4.|4,|4v|βr) |πwhen adding 
 8498  or multiplying, respectively, using the convention 
 8504  (15).] Equation (12) may be used for subtraction. 
 8512  Clearly, these operations can be readily performed 
 8519  even when |εm|βj |πis larger than the computer's 
 8527  word size; it is a simple matter to compute the 
 8537  remainder of a positive number modulo a power 
 8545  of 2, or to divide a number by a power of 2. 
 8557  In (17) we have the sum of the ``upper half'' 
 8567  and the ``lower half'' of the product, as discussed 
 8576  in exercise 3.2.1.1<8.|'!|9|4|1|1|1If moduli 
 8581  of the form 2|ε|ge|rj|4α_↓|41 |πare to be used, 
 8589  we must know under what conditions the number 
 8597  |ε2|ge|4α_↓|41 |πis relatively prime to the number 
 8604  2|ε|gf|4α_↓|41. |πFortunately, there is a very 
 8610  simple rule,|'{A9}|ε|πgcd(2|ε|ge|4α_↓|41,|42|gf|4α_↓|41)|4α=
 8612  ↓|42|π|gg|gc|gd|g(|ε|ge|g,|gf|g)|4α_↓|41,|J!(18)|;
 8613  {A9}|πwhich states in particular that 2|ε|ge|4α_↓|41 
 8619  and 2|gf|4α_↓|41 are relatively prime if and 
 8626  only if e and f are relatively prime. |πEquation 
 8635  (18) follows from Euclid's algorithm and the 
 8642  identity|'{A8}|ε(2|ge|4α_↓|41)|πmod(2|ε|gf|4α_↓|41)|4α=↓|42|
 8643  ge|π|1|1|gm|go|gd|1|1|ε|gf|4α_↓|41.|J!(19)|;{A9}|π(See 
 8645  exercise 6.) Thus we could choose for example 
 8653  |εm|β1|4α=↓|42|g3|g5|4α_↓|41, m|β2|4α=↓|42|g3|g4|4α_↓|41, 
 8655  m|β3|4α=↓|42|g3|g3|4α_↓|41, m|β4|4α=↓|42|g3|g1|4α_↓|41, 
 8657  m|β5|4α=↓|42|g2|g9|4α_↓|41, |πif we had a computer 
 8663  with word size 2|g3|g5 and if we wanted to represent 
 8673  numbers *?(See exercise 6.) Thus we could choose 
 8681  for example |εm|β1|4α=↓|42|g3|g5|4α_↓|41, m|β2|4α=↓|42|g3|g4
 8684  |4α_↓|41, m|β3|4α=↓|42|g3|g3|4α_↓|41, m|β4|4α=↓|42|g3|g1|4α_
 8686  ↓|41, m|β5|4α=↓|42|g2|g9|4α_↓|41, |πif we had 
 8691  a computer with word size 2|g3|g5 and if we wanted 
 8701  to represent numbers up to |εm|β1m|β2m|β3m|β4m|β5|4|¬Q|42|g1
 8706  |g6|g1. |πThis range of integers is not big enough 
 8715  to make modular arithmetic faster than the conventional 
 8723  method, and we usually _nd that modular arithmetic 
 8731  using convention (15) is advantageous only when 
 8738  the |εm|βj |πare larger than the word size or 
 8747  when division is inconvenient.|'!|9|4|1|1|1As 
 8752  we have already observed, the operations of conversion 
 8760  to and from modular representation are very important. 
 8768  If we are given a number |εu, |πits modular representation 
 8778  (|εu|β1,|4.|4.|4.|4,|4u|βr) |πmay be obtained 
 8782  by dividing |εu |πby |εm|β1,|4.|4.|4.|4,|4m|βr 
 8787  |πand saving the remainders. A possibly more 
 8794  attractive procedure, if |εu|4α=↓|4(v|βmv|βm|βα_↓|β1|4.|4.|4
 8797  .|4v|β0)|βb, |πis to evaluate the polynomial|'
 8803  {A9}|ε(.|4.|4.|4(v|βmb|4α+↓|4v|βm|βα_↓|β1)b|4α+↓|4.|4.|4.)b|
 8803  4α+↓|4v|β0|;{A9}|πusing modular arithmetic. When 
 8808  |εb|4α=↓|42 |πand when the modulus |εm|βj |πhas 
 8815  the special form 2|ε|g2|rj|4α_↓|41, |πboth of 
 8821  these methods reduce to quite a simple procedure:|'
 8829  Consider the binary representation of |εu |πwith 
 8836  blocks of |εe|βj |πbits grouped together,|'{A9}|εu|4α=↓|4a|β
 8842  tA|gt|4α+↓|4a|βt|βα_↓|β1A|gt|gα_↓|g1|4α+↓|4|¬O|4|¬O|4|¬O|4α↓
 8842  |4a|β1A|4α+↓|4a|β0,|J!(20)|;{A9}|πwhere |εA|4α=↓|42|ge|rj 
 8845  |πand 0|4|¬E|4|εa|βk|4|¬W|42|ge|rj |πfor |ε0|4|¬E|4k|4|¬E|4t
 8848  . |πThen|'{A9}|εu|4|"o|4a|βt|4α+↓|4a|βt|βα_↓|β1|4α+↓|4|¬O|4|
 8850  ¬O|4|¬O|4α+↓|4a|β1|4α+↓|4a|β0!(|πmodulo|42|ε|ge|rj|4α_↓|41),
 8850  |J!(21)|;{A9}|πsince |εA|4|"o|41. |πTherefore 
 8854  we may obtain |εu|βj |πby adding the |εe|βj-|πbit 
 8862  numbers |εa|βt|4|↔V|4|¬O|4|¬O|4|¬O|4|↔V|4a|β1|4|↔V|4a|β0, 
 8864  |πmodulo 2|ε|ge|rj|4α_↓|41, |πas in Eq. (16). 
 8870  This process is similar to the familiar device 
 8878  of ``casting out nines'' which is used to determine 
 8887  |εu |πmod 9 when |εu |πis expressed in the decimal 
 8897  system.|'!|9|4|1|1|1Conversion back from modular 
 8902  form to positional notation is somewhat more 
 8909  di∃cult. It is interesting in this regard to 
 8917  make a few side remarks about the way computers 
 8926  make us change our viewpoint towards mathematical 
 8933  proofs: Theorem C tells us that the conversion 
 8941  from (|εu|β1,|4.|4.|4.|4,|4u|βr) |πto |εu |πis 
 8946  possible, and two proofs are given. The _rst 
 8954  proof we considered is a classical one which 
 8962  makes use only of very simple concepts, namely 
 8970  the facts that|'{A12}{I1.2H}|4|1i)|9any number 
 8975  which is a multiple of |εm|β1 |πand of |εm|β2,|4.|4.|4.|4,|4
 8983  |πand of |εm|βr, |πmust be a multiple of |εm|β1m|β2|4.|4.|4.
 8991  |4m|βr |πwhen the |εm|βj'|πs are pairwise relatively 
 8998  prime; and|'ii)|9if |εm |πthings are put into 
 9006  |εm |πboxes with no two things in the same box, 
 9016  then there must be one in each box.|'{A12}{IC}By 
 9025  traditional notions of mathematical aesthetics, 
 9030  this is no doubt the nicest proof of Theorem 
 9039  C; but from a computational standpoint it is 
 9047  completely worthless*3 It amounts to saying, ``Try 
 9054  |εu|4α=↓|4a, a|4α+↓|41,|4.|4.|4. |πuntil you 
 9058  _nd a value for which |εu|4|"o|4u|β1 (|πmodulo|4|εm|β1),|4.|
 9064  4.|4.|4,|4u|4|"o|4u|βr (|πmodulo|4|εm|βr).''|'
 9066  |π!|9|4|1|1|1The second proof of Theorem C is 
 9073  more explicit; it shows how to compute |εr |πnew 
 9082  constants |εM|β1,|4.|4.|4.|4,|4M|βr, |πand to 
 9086  get the solution in terms of these constants 
 9094  by formula (9). This proof uses more complicated 
 9102  concepts (for example, Euler's theorem), but 
 9108  it is much more satisfactory from a computational 
 9116  standpoint, since the constants |εM|β1,|4.|4.|4.|4,|4M|βr 
 9121  |πneed to be determined only once. On the other 
 9130  hand, the determination of |εM|βj |πby Eq. (8) 
 9138  is certainly not trivial, since the evaluation 
 9145  of Euler's |ε|≤'-|πfunction requires, in general, 
 9151  the factorization of |εm|βj |πinto prime powers. 
 9158  Furthermore, |εM|βj |πis likely to be a terribly 
 9166  large number, even if we compute only the quantity 
 9175  |εM|βj |πmod |εm (|πwhich will work just as well 
 9184  as |εM|βj |πin (9)). Since |εM|βj |πmod |εm |πis 
 9193  uniquely determined if (7) is to be satis_ed 
 9201  (because of the Chinese Remainder Theorem*3), 
 9207  we can see that, in any event, Eq. (9) requires 
 9217  a lot of high-precision calculation, and such 
 9224  calculation is just what we wished to avoid by 
 9233  modular arithmetic in the _rst place.|'!|9|4|1|1|1So 
 9240  we need an even |εbetter |πproof of Theorem C 
 9249  if we are going to have a really usable method 
 9259  of conversion from (|εu|β1,|4.|4.|4.|4,|4u|βr) 
 9263  |πto |εu. |πSuch a method was suggested by H. 
 9272  L. Garner in 1958; it can be carried out using 
 9282  (|ε|urr|)2|)) |πconstants |εc|βi|βj |πfor 1|4|¬E|4|εi|4|¬W|4
 9286  j|4|¬E|4r, |πwhere|'{A9}|εc|βi|βjm|βi|4|"o|41!(|πmodulo|4|εm
 9288  |βj).|J!(22)|;{A9}|πThese constants |εc|βi|βj 
 9292  |πare readily computed using Euclid's algorithm, 
 9298  since Algorithm 4.5.2X determines |εa, b |πsuch 
 9305  that |εam|βi|4α+↓|4bm|βj|4α=↓|4|πgcd(|εm|βi,|4m|βj)|4α=↓|41 
 9307  |πand we may take |εc|βi|βj|4α=↓|4a. |πWhen the 
 9314  moduli have the special form e|ε|ge|rj|4α_↓|41, 
 9320  |πa simple method of determining |εc|βi|βj |πis 
 9327  given in exercise 6.|'|9|4|1|1|1Once the |εc|βi|βj 
 9334  |πhave been determined satisfying (22), we can 
 9341  set|'{A9}|ε!!!v|β1|4|¬L|4u|β1|4|πmod|4|εm|β1,|'
 9343  {A4}!!!v|β2|4|¬L|4(u|β2|4α_↓|4v|β1)c|β1|β2|4|πmod|4|εm|β2,|'
 9344  {A4}!!!v|β3|4|¬L|4{H12}({H10}(u|β3|4α_↓|4v|β1)c|β1|β3|4α_↓|4
 9344  v|β2{H12}){H10}c|β2|β3|4|πmod|4|εm|β3,|J!(23)|'
 9345  {A4}!!!|¬O|4|¬O|4|¬O|'{A4}!!!v|βr|4|¬L|4(.|4.|4.|4{H12}({H10
 9346  }(u|βr|4α_↓|4v|β1)c|β1|βr|4α_↓|4v|β2{H12}){H10}c|β2|βr|4α_↓|
 9346  4|¬O|4|¬O|4|¬O|4α_↓|4v|βr|βα_↓|β1{H12}){H10}c|ur|)(rα_↓1)r|)
 9346  |4|πmod|4|εm|βr.|'{A6}|πThen|'{A6}|εu|4α=↓|4v|βrm|βr|βα_↓|β1
 9348  |4.|4.|4.|4m|β1|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4v|β3m|β2m|β1|4α+↓|
 9348  4v|β2m|β1|4α+↓|4v|β1|J!(24)|;{A9}|πis a number 
 9352  satisfying the conditions|'{A9}|ε0|4|¬E|4u|4|¬W|4m,!!u|4|"o|
 9355  4u|βj!(|πmodulo|4|εm|βj),!!1|4|¬E|4j|4|¬E|4r.|J!(25)|;
 9356  {A9}|π(See exercise 8; another way of rewriting 
 9363  (23) which does not involve as many auxiliary 
 9371  constants is given in exercise 7.) Equation (24) 
 9379  is a |εmixed radix representation |πof |εu, |πwhich 
 9387  may be converted to binary or decimal notation 
 9395  using the methods of Section 4.4. If 0|4|¬E|4|εu|4|¬W|4m 
 9403  |πis not the desired range, an appropriate multiple 
 9411  of |εm |πcan be added or subtracted after the 
 9420  conversion process.|'!|9|4|1|1|1The advantage 
 9424  of the computation shown in (23) is that the 
 9433  calculation of |εv|βj |πcan be done using only 
 9441  arithmetic mod |εm|βj, |πwhich is already built 
 9448  into the modular arithmetic algorithms. Furthermore, 
 9454  (23) allows parallel computation: We can start 
 9461  with |ε(v|β1,|4.|4.|4.|4,|4v|βr)|4|¬L|4(u|β1|4|πmod|4|εm|β1,
 9462  |4.|4.|4.|4,|4u|βr |πmod |εm|βr), |πthen at time 
 9468  |εj |πfor |ε1|4|¬E|4j|4|¬W|4r |πwe simultaneously 
 9473  set |εv|βk|4|¬L|4(v|βk|4α_↓|4v|βj)c|βj|βk |πmod 
 9476  |εm|βk |πfor |εj|4|¬W|4k|4|¬E|4r. |πAn alternative 
 9481  way to compute the mixed-radix representation, 
 9487  allowing similar possibilities for parallelism, 
 9492  has been discussed by A. S. Fraenkel, |εProc. 
 9500  ACM Nat. Conf. |≡1|≡9 (|ε|πPhiladelphia, 1965), 
 9506  E1.4.|'!|9|4|1|1|1It is important to observe 
 9512  that the mixed radix representation (24) is su∃cient 
 9520  to compare the magnitudes of two modular numbers. 
 9528  For if we know that |ε0|4|¬E|4u|4|¬W|4m |πand 
 9535  |ε0|4|¬E|4u|¬S|4|¬W|4m, |πthen we can tell if 
 9541  |εu|4|¬W|4u|¬S |πby _rst doing the conversion 
 9547  to |εv|β1,|4.|4.|4.|4,|4v|βr |πand |εv|ur|↔0|)1|),|4.|4.|4.|
 9550  4,|4v|ur|↔0|)r|), |πthen testing if |εv|βr|4|¬W|4v|ur|↔0|)r|
 9554  ), |πor if |εv|βr|4α=↓|4v|ur|↔0|)r|) |πand |εv|βr|βα_↓|β1|4|
 9559  ¬W|4v|ur|↔0|)rα_↓1|), |πetc. It is not necessary 
 9565  to convert all the way to binary or decimal notation 
 9575  if we only want to know whether (|εu|β1,|4.|4.|4.|4,|4u|βr) 
 9583  |πis less than (|εu|ur|↔0|)1|),|4.|4.|4.|4,|4u|ur|↔0|)r|)).|
 9586  '|π!|9|4|1|1|1The operation of comparing two 
 9592  numbers, or of deciding if a modular number is 
 9601  negative, is intuitively very simple, so we would 
 9609  expect to _nd a much easier method for making 
 9618  this test than the conversion to mixed radix 
 9626  form. But the following theorem shows that there 
 9634  is little hope of _nding a substantially easier 
 9642  method, since the range of a modular number depends 
 9651  essentia{U0}{H9L11M29}|πW58320#Computer programming!(Knuth/A
folio 370 galley 9
 9652  ddision-Wesley)!f.370!Ch.4!G.9b.|'{A20}{H10L12M29}|≡T|≡h|≡e|
 9653  ≡o|≡r|≡e|≡m |≡S|≡. (|εNicholas Szab|=1o, |*/|↔O|↔m|↔o|↔O|\). 
 9658  In terms of the notation above, assume that m|β1|4|¬W|4{H11}
 9666  |¬H{H10}|v4m|), and let L be any value in the 
 9675  range|'{A9}m|β1|4|¬E|4L|4|¬E|4m|4α_↓|4m|β1.|J!(26)|;
 9677  {A9}|εLet g be any function such that the set 
 9686  |¬Tg(0),|4g(1),|4.|4.|4.|4,|4g(m|β1|4α_↓|41)|¬Y 
 9687  contains less than m|β1 values. Then there are 
 9695  numbers u and v such that|'{A9}g(u|4|πmod|4|εm|β1)|4α=↓|4g(v
 9701  |4|πmod|4|εm|β1),!!u|4|πmod|4|εm|βj|4α=↓|4v|4|πmod|4|εm|βj!!
 9701  |πfor!!2|4|¬E|4|εj|4|¬E|4r;|J!(27)|;{A9}0|4|¬E|4u|4|¬W|4L|4|
 9702  ¬E|4v|4|¬W|4m.|J!(28)|;{A9}|εProof.|9|4|πBy hypothesis, 
 9705  there must exist numbers |εu|4|=|↔6α=↓|4v |πsatisfying 
 9711  (27), since |εg |πmust take on the same value 
 9720  for two di=erent residues. Let |ε(u,|4v) |πbe 
 9727  a pair of values with 0|4|¬E|4|εu|4|¬W|4v|4|¬W|4m 
 9733  |πsatisfying (27), for which |εu |πis a minimum. 
 9741  Since |εu|¬S|4α=↓|4u|4α_↓|4m|β1 |πand |εv|¬S|4α=↓|4v|4α_↓|4m
 9744  |β1 |πalso satisfy (27), we must have |εu|¬S|4|¬W|40 
 9752  |πby the minimality of |εu. |πHence |εu|4|¬W|4m|β1|4|¬E|4L; 
 9759  |πand if (28) does not hold, we must have |εv|4|¬W|4L. 
 9769  |πBut |εv|4|¬Q|4u, |πand |εv|4α_↓|4u |πis a multiple 
 9776  of |εm|β2|4.|4.|4.|4m|βr|4α=↓|4m/m|β1, |πso |εv|4|¬R|4v|4α_↓
 9779  |4u|4|¬R|4m/m|β1|4|¬Q|4m|β1. |πTherefore, if 
 9782  (28) does not hold for |ε(u,|4v), |πit will be 
 9791  satis_ed for the pair (|εu|¬C,|4v|¬C)|4α=↓|4(v|4α_↓|4m|β1,|4
 9795  u|4α+↓|4m|4α_↓|4m|β1).|'{A12}|π!|9|4|1|1|1Of 
 9797  course, a similar result can be proved for any 
 9806  |εm|βj |πin place of |εm|β1; |πand we could also 
 9815  replace (28) by the condition |ε``a|4|¬E|4u|4|¬W|4a|4α+↓|4L|
 9820  4|¬E|4v|4|¬W|4a|4α+↓|4m'' |πwith only minor changes 
 9825  in the proof. Therefore Theorem S shows that 
 9833  many simple functions cannot be used to determine 
 9841  the range of a modular number.|'!|9|4|1|1|1Let 
 9848  us now reiterate the main points of the discussion 
 9857  in this section: Modular arithmetic can be a 
 9865  signi_cant advantage for applications in which 
 9871  the predominant calculations involve exact multiplication 
 9877  (or raising to a power) of large integers, combined 
 9886  with addition and subtraction, but where there 
 9893  is very little need to divide or compare numbers, 
 9902  |εor to test whether intermediate results ``over⊃ow'' 
 9909  out of range. (|πIt is important not to forget 
 9918  the latter restriction; methods are available 
 9924  to test for over⊗ow, as in exercise 12, but they 
 9934  are in general so complicated that they nullify 
 9942  the advantages of modular arithmetic.) Several 
 9948  applications for modular computations have been 
 9954  discussed by H. Takahasi and Y. Ishibashi, |εInformation 
 9962  Processing in Japan |≡1 (1961), 28<42.|'|π!|9|4|1|1|1An 
 9969  example of such an application is the exact solution 
 9978  of linear equations with rational coe∃cients. 
 9984  For various reasons it is desirable in this case 
 9993  to assume that the moduli |εm|β1,|4m|β2,|4.|4.|4.|4,|4m|βr 
 9999  |πare all large prime numbers; the linear equations 
10007  can be solved independently modulo each |εm|βj. 
10014  |πA detailed discussion of this procedure has 
10021  been given by I. Borosh and A. S. Fraenkel [|εMath. 
10031  Comp. |≡2|≡0 (1966), 107<112]. |πBy means of 
10038  their method, the nine independent solutions 
10044  of a system of 111 linear equations in 120 unknowns 
10054  were obtained exactly in less than one hour's 
10062  running time on a CDC 1604 computer. The same 
10071  procedure is worth while also for solving simultaneous 
10079  linear equations with ⊗oating-point coe∃cients, 
10084  when the matrix of coe∃cients is ill-conditioned. 
10091  The modular technique (treating the given ⊗oating-point 
10098  coe∃cients as exact rational numbers) gives a 
10105  method for obtaining the |εtrue |πanswers in 
10112  less time than conventional methods can produce 
10119  reliable |εapproximate |πanswers*3 [See M. T. 
10125  McClellan, |εJACM |≡2|≡0 (1973), 563<588, |πfor 
10131  further developments of this approach; and see 
10138  also |ε|πE. H. Bareiss, |εJ. Inst. Math. and 
10146  Appl. |≡1|≡0 (1972), 68<104 |πfor a discussion 
10153  of its limitations.]|'!|9|4|1|1|1The published 
10158  literature concerning modular arithmetic is mostly 
10164  oriented towards hardware design, since the carry-free 
10171  properties of modular arithmetic make it attractive 
10178  from the standpoint of high-speed operation. 
10184  The idea was _rst published by A. Svoboda and 
10193  M. Valach in the Czechoslovakian journal |εStroje 
10200  na Zpracov|=1an|=1i Informac|=1i |≡3 (1955), 
10205  247<295; |πthen independently by H. L. Garner 
10212  [|εIRE Transactions |π|≡E|≡C|≡-|≡8 (1959), 140<147]. 
10217  The use of moduli of the form |ε2|ge|rj|4α_↓|41 
10225  |πwas suggested by A. S. Fraenkel [|εJACM |≡8 
10233  (1961), 87<96], |πand several advantages of such 
10240  moduli were demonstrated by A. Sch|=4onhage [|εComputing 
10247  |≡1 (1966), 182<196]. |πSee the book |εResidue 
10254  Arithmetic and its Applications to Computer Technology 
10261  |πby N. S. Szab|=1o and R. I. Tanaka (New York: 
10271  McGraw-Hill, 1967), for additional information 
10276  and a comprehensive bibliography of the subject.|'
10283  !|9|4|1|1|1Further discussion of modular arithmetic 
10288  can be found in part B of Section 4.3.3.|'{A24}|∨E|∨X|∨E|∨R|
10297  ∨C|∨I|∨S|∨E|∨S|'{A12}{H9L11M29}|9|1|≡1|≡.|9|4[|ε|*/|↔P|↔c|\] 
10299  |πFind all integer numbers |εu |πwhich satisfy 
10306  the conditions |εu |πmod 7|4α=↓|41, |εu |πmod 
10313  11|4α=↓|46, |εu |πmod 13|4α=↓|45, |πand 0|4|¬E|4|εu|4|¬W|410
10318  00.|'{A3}|9|1|≡2|≡.|9|4[|ε|*/M|↔P|↔c|\] |πWould 
10321  Theorem C still be true if we allowed |εa, u|β1, 
10331  u|β2,|4.|4.|4.|4,|4u|βr |πand |εu |πto be arbitrary 
10337  real numbers (not just integers)?|'{A3}|9|1|≡3|≡.|9|4[|εM|*/|
10342  ↔P|↔o|\] (|εGeneralized Chinese Remainder Theorem.) 
10347  |πLet |εm|β1,|4m|β2,|4.|4.|4.|4,|4m|βr |πbe positive 
10351  integers. Let |εm |πbe the least common multiple 
10359  of |εm|β1,|4m|β2,|4.|4.|4.|4,|4m|βr, |πand let 
10363  |εa, u|β1, u|β2,|4.|4.|4.|4,|4u|βr |πbe any integers. 
10369  Prove that there is exactly one integer |εu |πwhich 
10378  satis_es the conditions|'{A9}|εa|4|¬E|4u|4|¬W|4a|4α+↓|4m,!!u
10381  |4|"o|4u|βj!(|πmodulo|4|εm|βj),!!1|4|¬E|4j|4|¬E|4r,|;
10382  {A9}|πprovided that|'{A9}|εu|βi|4|"o|4u|βj!(|πmodulo|4gcd(|ε
10384  m|βi,|4m|βj){H11}){H9},!!1|4|¬E|4|εi|4|¬W|4j|4|¬E|4r;|;
10385  {A9}|π{H9}and there is no such integer |εu |πwhen 
10393  the latter condition fails to hold.|'{A3}|9|1|≡4|≡.|9|4[|ε|*/
10399  |↔P|↔c|\] |πContinue the process shown in (13); 
10406  what would |εm|β7, m|β8, m|β9, m|β1|β0 |πbe?|'
10413  {A3}|9|1|≡5|≡.|9|4[|εM|*/|↔P|↔L|\] |πSuppose that 
10416  the method of (13) is continued until no more 
10425  |εm|βj |πcan be chosen; does this method give 
10433  the largest attainable value |εm|β1m|β2|4.|4.|4.|4m|βr 
10438  |πsuch that the |εm|βj |πare odd positive integers 
10446  less than 100 which are relatively prime in pairs?|'
10455  {A3}|9|1|≡6|≡.|9|4[|εM|*/|↔P|↔P|\] |πLet |εe, 
10458  f, g |πbe nonnegative integers. (a) Show that 
10466  |ε2|ge|4|"o|42|gf (|πmodulo |ε2|gg|4α_↓|41) |πif 
10470  and only if |εe|4|"o|4f (|πmodulo |εg). |π(b) 
10477  Given that |εe |πmod |εf|4α=↓|4d |πand |εce |πmod 
10485  |εf|4α=↓|41, |πprove that|'{A9}|ε{H10}({H9}(1|4α+↓|42|gd|4α+
10488  ↓|4|¬O|4|¬O|4|¬O|4α+↓|42|ur(cα_↓1)d|))|4|¬O|4(2|ge|4α_↓|41){
10488  H10}){H9}|πmod|4(2|ε|gf|4α_↓|41)|4α=↓|41.|;{A9}|π[Thus, 
10490  we have a comparatively simple formula for the 
10498  inverse of |ε2|ge|4α_↓|41, |πmodulo |ε2|gf|4α_↓|41, 
10503  |πas required in (22).]|'{A3}|9|1|≡7|≡.|9|4[|εM|*/|↔P|↔O|\] 
10508  |πShow that (23) can be rewritten as follows:|'
10516  {A9}|ε!v|β1|4|¬L|4u|β1|4|πmod|4|εm|β1,|'{A4}!v|β2|4|¬L|4(u|β
10517  2|4α_↓|4v|β1)c|β1|β2|4|πmod|4|εm|β2,|'{A4}!v|β3|4|¬L|4{H11}(
10518  {H9}u|β3|4α_↓|4(v|β1|4α+↓|4m|β1v|β2){H11}){H9}c|β1|β3c|β2|β3
10518  |4|πmod|4|εm|β3,|'!|¬O|4|¬O|4|¬O|'!v|βr|4|¬L|4{H11}({H9}u|βr
10520  |4α_↓|4(v|β1|4α+↓|4m|β1(v|β2|4α+↓|4m|β2(v|β3|4α+↓|4|¬O|4|¬O|
10520  4|¬O|4α+↓|4m|ur|)rα_↓2|)v|βr|βα_↓|β1)|4.|4.|4.)){H11}){H9}c|
10520  β1|βr|4.|4.|4.|4c|ur|)(rα_↓1)r|)|4|πmod|4|εm|βr.|'
10521  {A9}|π{H9L11}If the formulas are rewritten in 
10527  this way, we see that only |εr|4α_↓|41 |πconstants 
10535  |εC|βj|4α=↓|4c|β1|βj|4.|4.|4.|4c|β(|βj|βα_↓|β1|β)|βj 
10536  |πmod |εm|βj |πare needed instead of |εr(r|4α_↓|41)/2 
10543  |πconstants |εc|βi|βj |πas in (23). Discuss the 
10550  relative merits of this version of the formula 
10558  as compared to (23), from the standpoint of computer 
10567  calculation.|'{A3}|9|1|≡8|≡.|9|4[|εM|*/|↔P|↔O|\] 
10569  |πProve that the number |εu |πde_ned by (23) 
10577  and (24) satis_es (25).|'{A3}|9|1|≡9|≡.|9|4[|εM|*/|↔P|↔c|\] 
10582  |πShow how to go from the values |εv|β1,|4.|4.|4.|4,|4v|βr 
10590  |πof the mixed radix notation (24) back to the 
10599  original residues |εu|β1,|4.|4.|4.|4,|4u|βr, 
10602  |πusing only arithmetic mod |εm|βj |πto compute 
10609  |εu|βj.|'{A3}|≡1|≡0|≡.|9|4[|εM|*/|↔P|↔C|\] |πAn 
10612  integer |εu |πwhich lies in the symmetrical range 
10620  (10) might be represented by _nding the numbers 
10628  |εu|β1,|4.|4.|4.|4,|4u|βr |πsuch that |εu|4|"o|4u|βj 
10632  (|πmodulo|4|εm|βj) |πand |→α_↓|εm|βj/2|4|¬W|4u|βj|4|¬W|4m|βj
10634  /2, |πinstead of insisting that 0|4|¬E|4|εu|βj|4|¬W|4m|βj 
10640  |πas in the text. Discuss the modular arithmetic 
10648  procedures that would be used in this case (including 
10657  the conversion process, (23){H11}){H9}.|'{A3}|≡1|≡1|≡.|9|4[|
10661  εM|*/|↔P|↔L|\] |πAssume that all the |εm|βj |πare 
10668  odd, and that |εu|4α=↓|4(u|β1,|4.|4.|4.|4,|4u|βr) 
10672  |πis known to be even, where |ε0|4|¬E|4u|4|¬W|4m. 
10679  |πFind a reasonably fast method to compute |εu/2 
10687  |πusing modular arithmetic.|'{A3}|≡1|≡2|≡.|9|4[|εM|*/|↔O|↔c|\
10690  ] |πProve that, if 0|4|¬E|4|εu,|4v|4|¬W|4m, |πthe 
10696  modular addition of |εu |πand |εv |πcauses over⊗ow 
10704  (i.e., is outside the range allowed by the modular 
10713  representation) if and only if the sum is less 
10722  than |εu. (|πThus the over⊗ow detection problem 
10729  is equivalent to the comparison problem.)|'{A3}|≡1|≡3|≡.|9|4
10735  [|εM|*/|↔P|↔C|\] (|εAutomorphic numbers.) |πAn 
10739  |εn-|πplace decimal number |εx|4|¬Q|41 |πis called 
10745  an ``automorph'' by recreational mathematicians 
10750  if the last |εn |πdigits of |εx|g2 |πare equal 
10759  to |εx; |πi.e., if |εx|g2 |πmod 10|ε|gn|4α=↓|4x. 
10766  [|πSee |εScienti⊂c American |≡2|≡1|≡8 (|πJanuary, 
10771  1968), 125.] For example, 9376 is a 4-place automorph, 
10780  since 9376|g2|4α=↓|487909376.|'!!|1|1(a) Prove 
10784  that an |εn-|πplace number |εx|4|¬Q|41 |πis an 
10791  automorph if and only if |εx |πmod 5|ε|gn|4α=↓|40 
10799  |πor 1, and |εx |πmod 2|ε|gn|4α=↓|41 |πor 0, 
10807  respectively. [Thus, if |εm|β1|4α=↓|42|gn |πand 
10812  |εm|β2|4α=↓|45|gn, |πthe only two |εn-|πplace 
10817  automorphs are the numbers |εM|β1 |πand |εM|β2 
10824  |πin (7).]|'!!|1|1(b) Prove that if |εx |πis 
10832  an |εn-|πplace automorph, then (3|εx|g2|4α_↓|42x|g3)|πmod 
10837  10|ε|g2|gn |πis a |ε2n-|πplace automorph.|'!!|1|1(c) 
10843  Given that |εc|≤x|4|"o|41 (|πmodulo |εy), |πwhat 
10849  is a simple formula for a number |εc|¬S |πsuch 
10858  that |εc|¬S|≤x|g2|4|"o|41 (|πmodulo |εy|g2)?|'
folio 372 galley 10
10862  |H{U0}{H9L11M29}|πW58320#Computer Programming!(Knuth/Addisio
10863  n-Wesley)!F.372!Ch.4!G.10b.|'{A20}{H10L12M29}|π|∨α/↓|∨4|∨.|∨
10864  3|∨.|∨3|∨. |∨H|∨o|∨w |∨F|∨a|∨s|∨t |∨C|∨a|∨n |∨W|∨e 
10869  |∨M|∨u|∨l|∨t|∨i|∨p|∨l|∨y|∨?|'{A6}The conventional 
10872  method for multiplication, Algorithm 4.3.1M, 
10877  requires approximately |εcmn |πoperations to 
10882  multiply an |εm-|πdigit number by an |εn-|πdigit 
10889  number, where |εc |πis a constant. In this section, 
10898  let us assume for convenience that |εm|4α=↓|4n, 
10905  |πand let us consider the following question: 
10912  |εDoes every general computer algorithm for multiplying 
10919  two n-digit numbers require an execution time 
10926  proportional to n|g2, as n increases|*/?|\|'|π!|9|4|1|1|1(In 
10933  this question, a ``general'' algorithm means 
10939  one which accepts, as input, the number |εn |πand 
10948  two arbitrary |εn-|πdigit numbers in positional 
10954  notation, and whose output is their product in 
10962  positional form. Certainly if we were allowed 
10969  to choose a di=erent algorithm for each value 
10977  of |εn, |πthe question would be of no interest, 
10986  since multiplication could be done for any speci_c 
10994  value of |εn |πby a ``table-lookup'' operation 
11001  in some huge table. The term ``computer algorithm'' 
11009  is meant to imply an algorithm which is suitable 
11018  for implementation on a digital computer such 
11025  as |¬m|¬i|¬x, and the execution time is to be 
11034  the time it takes to perform the algorithm on 
11043  such a computer.)|'{A12}|≡A|≡. |≡D|≡i|≡g|≡i|≡t|≡a|≡l 
11048  |≡m|≡e|≡t|≡h|≡o|≡d|≡s|≡.|9|4The answer to the 
11052  above question is, rather surprisingly, ``No,'' 
11058  and, in fact, it is not very di∃cult to see why. 
11069  For convenience, let us assume throughout this 
11076  section that we are working with integers expressed 
11084  in binary notation. If we have two 2|εn-|πbit 
11092  numbers |εu|4α=↓|4(u|β2|βn|βα_↓|β1|4.|4.|4.|4u|β1u|β0)|β2 
11094  |πand |εv|4α=↓|4(v|β2|βn|βα_↓|β1|4.|4.|4.|4v|β1v|β0)|β2, 
11096  |πwe can write|'{A9}|εu|4α=↓|42|gnU|β1|4α+↓|4U|β0,!!v|4α=↓|4
11099  2|gnV|β1|4α+↓|4V|β0,|J!(1)|;{A9}|πwhere |εU|β1|4α=↓|4(u|β2|β
11101  n|βα_↓|β1|4.|4.|4.|4u|βn)|β2 |πis the ``most-signi_cant 
11105  half'' of |εu |πand |εU|β0|4α=↓|4(u|βn|βα_↓|β1|4.|4.|4.|4u|β
11109  0)|β2 |πis the ``least-signi_cant half''; and 
11115  similarly |εV|β1|4α=↓|4(v|β2|βn|βα_↓|β1|4.|4.|4.|4v|βn)|β2, 
11117  V|β0|4α=↓|4(v|βn|βα_↓|β1|4.|4.|4.|4v|β0)|β2. 
11118  |πNow we have|'{A9}|εuv|4α=↓|4(2|g2|gn|4α+↓|42|gn)U|β1V|β1|4
11121  α+↓|42|gn(U|β1|4α_↓|4U|β0)(V|β0|4α_↓|4V|β1)|4α⊗↓|4(2|gn|4α+↓
11121  |41)U|β0V|β0.|J!(2)|;{A9}|πThis formula reduces 
11125  the problem of multiplying |ε2n-|πbit numbers 
11131  to three multiplications of |εn-|πbit numbers, 
11137  |εU|β1V|β1, (U|β1|4α_↓|4U|β0)(V|β0|4α_↓|4V|β1), 
11139  |πand |εU|β0V|β0, |πplus some simple shifting 
11145  and adding operations.|'!|9|4|1|1|1Formula (2) 
11150  can be used for double-precision multiplication 
11156  when a quadruple precision result is desired, 
11163  and it is just a little faster than the traditional 
11173  method on many machines. It is more important 
11181  to observe that we can use formula (2) to de_ne 
11191  a recursive process for multiplication which 
11197  is signi_cantly faster than the familiar order-|εn|g2 
11204  |πmethod when |εn |πis large: If |εT(n) |πis 
11212  the time required to perform multiplication of 
11219  |εn-|πbit numbers, we have|'{A9}*?*?*?*?{A9}|εT(2n)|4|¬E|43T(n)|
11223  4α+↓|4cn|J!(3)|;{A9}|πfor some constant |εc, 
11228  |πsince the right-hand side of (2) uses just 
11236  three multiplications plus some additions and 
11242  shifts. Relation (3) implies by induction that|'
11249  {A9}|εT(2|gk)|4|¬E|4c(3|gk|4α_↓|42|gk),!!k|4|¬R|41,|J!(4)|;
11250  {A9}|πif we choose |εc |πto be large enough so 
11259  that this inequality is valid when |εk|4α=↓|41; 
11266  |πand therefore we have|'{A9}|ε|h|εT(n)|4|¬E|4T(2|g|"p|π|gl|
11270  gg|1|1|ε|gn|g|"P)|4|∂|¬E|43c|4|¬O|43|π|gl|gg|1|1|ε|gn|4α=↓|4
11270  3cn|π|gl|gg|1|1|g3.|E|n|;| |εT(n)|4|¬E|4T(2|g|"p|π|gl|gg|1|1
11271  |ε|gn|g|"P)|4|¬E|4c(3|g|"p|π|gl|gg|1|1|ε|gn|g|"P|4α_↓|42|g|"
11271  p|π|gl|gg|1|1|ε|gn|"P)>{A4}|L|4|¬E|43c|4|¬O|43|π|gl|gg|1|1|ε
11272  |gn|4α=↓|43cn|π|gl|gg|1|1|ε|g3.|J!(5)>{A9}|πRelation 
11274  (5) shows that the running time for multiplication 
11282  can be reduced from order |εn|g2 |πto order |εn|π|gl|gg|1|1|
11290  g3|4|¬V|4|εn|g1|g.|g5|g8|g5, |πand of course 
11294  this is a much faster algorithm when |εn |πis 
11303  large.|'!|9|4|1|1|1(A similar but more complicated 
11309  method for doing multiplication with running 
11315  time of order |εn|π|gl|gg|1|1|g3 was apparently 
11321  _rst suggested by A. Karatsuba and Yu. Ofman, 
11329  |εDoklady Akad. Nauk SSSR |≡1|≡4|≡5 (1962), 293<294. 
11336  |πCuriously, this idea does not seem to have 
11344  been discovered before 1962; none of the ``calculating 
11352  prodigies'' who have become famous for their 
11359  ability to multiply large numbers mentally have 
11366  been reported to use any such method, although 
11374  formula (2) adapted to decimal notation would 
11381  seem to lead to a reasonably easy way to multiply 
11391  eight-digit numbers in one's head.)|'!|9|4|1|1|1The 
11397  running time can be reduced still further, in 
11405  the limit as |εn |πapproaches in_nity, if we 
11413  observe that the method just used is essentially 
11421  the special case |εr|4α=↓|41 |πof a more general 
11429  method that yields|'{A9}|εT{H12}({H10}(r|4α+↓|41)n{H12}){H10
11432  }|4|¬E|4(2r|4α+↓|41)T(n)|4α+↓|4cn|J!(6)|;{A9}|π{H10L12}for 
11434  any _xed |εr. |πThis more general method can 
11442  be obtained as follows: Let|'{A9}|εu|4α=↓|4(u|β(|βr|βα+↓|β1|
11447  β)|βn|βα_↓|β1|4.|4.|4.|4u|β1u|β0)|β2!!|πand!!|εv|4α=↓|4(v|β(
11447  |βr|βα+↓|β1|β)|βn|βα_↓|β1|4.|4.|4.|4v|β1v|β0)|β2|;
11448  {A9}|πbe broken into |εr|4α+↓|41 |πpieces,|'{A9}|εu|4α=↓|4U|
11453  βr2|gr|gn|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4U|β12|gn|4α+↓|4U|β0,!!v|
11453  4α=↓|4V|βr2|gr|gn|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4V|β12|gn|4α+↓|4V
11453  |β0,|J!(7)|;{A9}|πwhere each |εU|βj |πand each 
11459  |εV|βj |πis an |εn-|πbit number. Consider the 
11466  polynomials|'{A9}|εU(x)|4α=↓|4U|βrx|gr|4α+↓|4|¬O|4|¬O|4|¬O|4
11467  α+↓|4U|β1x|4α+↓|4U|β0,!!V(x)|4α=↓|4V|βrx|gr|4α+↓|4|¬O|4|¬O|4
11467  |¬O|4α+↓|4V|β1x|4α+↓|4V|β0,|J!(8)|;{A9}|πand 
11469  let|'{A9}|εW(x)|4α=↓|4U(x)V(x)|4α=↓|4W|β2|βrx|g2|gr|4α+↓|4|¬
11470  O|4|¬O|4|¬O|4α+↓|4W|β1x|4α+↓|4W|β0.|J!(9)|;{A9}|πSince 
11472  |εu|4α=↓|4U(2|gn) |πand |εv|4α=↓|4V(2|gn), |πwe 
11476  have |εuv|4α=↓|4W(2|gn), |πso we can easily compute 
11483  |εuv |πif we know the coe∃cients of |εW(x). |πThe 
11492  problem is to _nd a good way to compute the coe∃cients 
11503  of |εW(x) |πby using only |ε2r|4α+↓|41 |πmultiplications 
11510  mxf  *?*?*?*?and |εv|4α=↓|4V(2|gn), |πwe have |εuv|4α=↓|4W(2|gn)
11516  , |πso we can easily compute |εuv |πif we know 
11526  the coe∃cients of |εW(x). |πThe problem is to 
11534  _nd a good way to compute the coe∃cients of |εW(x) 
11544  |πby using only |ε2r|4α+↓|41 |πmultiplications 
11549  of |εn-|πbit numbers plus some further operations 
11556  which involve only an execution time proportional 
11563  to |εn. |πThis can be done by computing|'{A9}|εU(0)V(0)|4α=↓
11571  |4W(0),!!U(1)V(1)|4α=↓|4W(1),!!.|4.|4.|4,!!U(2r)V(2r)|4α=↓|4
11571  W(2r).|J!(10)|;{A9}|πThe coe∃cients of a polynomial 
11577  of degree |ε2r |πcan be written as a linear combination 
11587  of the values of that polynomial at |ε2r|4α+↓|41 
11595  |πdistinct points; such a linear combination 
11601  requires an execution time at most proportional 
11608  to |εn. (|πActually, the products |εU(j)V(j) 
11614  |πare not strictly products of |εn-|πbit numbers, 
11621  but they are products of at most (|εn|4α+↓|4t)-|πbit 
11629  numbers, where |εt |πis a _xed value depending 
11637  on |εr. |πIt is easy to design a multiplication 
11646  routi!|9|4|1|1|1Relation (6) can be used to show 
11653  that |εT(n)|4|¬E|4c|β3n|π|gl|go|gg|ε|rr|rα+↓|r1|g(|g2|gr|gα+
11654  ↓|g1|g)|4|¬W|4c|β3n|g1|gα+↓|π|gl|go|gg|ε|rr|rα+↓|r1|g2, 
11655  |πusing a method analogous to the derivation 
11662  of (5), so we have now proved:|'{A12}|≡T|≡h|≡e|≡o|≡r|≡e|≡m 
11670  |≡A|≡.|9|4|εGiven |≤e|4|¬Q|40, there exists a 
11675  constant c(|≤e) and a multiplication algorithm 
11681  such that the number of elementary operations 
11688  T(n) needed to multiply two n-bit numbers satis⊂es|'
11696  {A9}T(n)|4|¬W|4c(|≤e)n|g1|gα+↓|g|≤e.|J!(11)|;
11697  {A9}|π!|9|4|1|1|1This theorem is still not the 
11703  result we are after. It is unsatisfactory for 
11711  practical purposes in that the method becomes 
11718  much more complicated as |ε|≤e|4|¬M|40 (|πand 
11724  therefore as |εr|4|¬M|4|¬X), |πcausing |εc(|≤e) 
11729  |πto grow so rapidly that extremely huge values 
11737  of |εn |πare needed before we have any signi_cant 
11746  improvement over (5). And it is unsatisfactory 
11753  for theoretical purposes because it does not 
11760  make use of the full power of the polynomial 
11769  method on which it is based. We can obtain a 
11779  better result if we let |εr vary |πwith |εn, 
11788  |πchoosing larger and larger values of |εr |πas 
11796  |εn |πincreases. This idea is due to A. L. Toom 
11806  [|εDoklady Akademiia Nauk SSSR |≡1|≡5|≡0 (1963), 
11812  496<498; |πtr. into English in |εSoviet Mathematics 
11819  |≡3 (1963), 714<716], |πwho used it to show that 
11828  computer circuitry for multiplication of |εn-|πbit 
11834  numbers can be constructed involving a fairly 
11841  small number of components as |εn |πgrows. S. 
11849  A. Cook [|εOn the minimum computation time of 
11857  functions (|πThesis, Harvard University, 1966), 
11862  51<77] later showed how Toom's method can be 
11870  adapted to fast computer programs.|'!|9|4|1|1|1Before 
11876  we discuss the Toom-Cook algorithm any further, 
11883  let us study a small example of the transition 
11892  from |εU(x) |πand |εV(x) |πto the coe∃cients 
11899  of |εW(x). |πThis example will not demonstrate 
11906  the e∃ciency of the method, since the numbers 
11914  are too small, but it points out some useful 
11923  simpli_cations that we can make in the general 
11931  case. Suppose that we want to multiply |εu|4α=↓|41234 
11939  |πtimes |εv|4α=↓|42341; |πin binary notation 
11944  this is |εu|4α=↓{U0}{H9L11M25}|πW58320#Computer 
folio 376 galley 11 WARNING: Some bad spots on this tape.
11947  Programming!(Knuth/Addision-Wesley)!f.376!Ch.4!G.11b.|'
11948  {A20}{H10L12M29}Hence we _nd, for |εW(x)|4α=↓|4U(x)V(x),|'
11953  {A9}|h|εW(0)|4|∂α=↓|410,!W(1)|4|∂α=↓|4304,!W(2)|4|∂α=↓|41980
11953  ,!W(3)|4|∂α=↓|47084,!W(4)|4|∂α=↓|418526.|E|n|;
11954  | U(0)|4|Lα=↓| 2,!U(1)|4|Lα=↓| 19,!U(2)|4|Lα=↓| 44,!U(3)|4|L
11954  α=↓| 77,!U(4)|4|Lα=↓|9|1118;>{A4}| V(0)|4|Lα=↓| 5,!V(1)|4|Lα
11955  =↓| 16,!V(2)|4|Lα=↓| 45,!V(3)|4|Lα=↓| 92,!V(4)|4|Lα=↓|4|9|11
11955  57;>{A4}| W(0)|4|Lα=↓| 10,!W(1)|4|Lα=↓| 304,!W(2)|4|Lα=↓| 19
11956  80,!W(3)|4|Lα=↓| 7084,!W(4)|4|Lα=↓18526.>{A4}(12)|?
11958  {A9}|πOur job now is to compute the _ve coe∃cients 
11967  of |εW(x) |πfrom the latter _ve values.|'!|9|4|1|1|1There 
11975  is an attractive little algorithm which can be 
11983  used to compute the coe∃cients of a polynomial 
11991  |εW(x)|4α=↓|4W|βm|βα_↓|β1x|gm|gα_↓|g1|4α+↓|4|¬O|4|¬O|4|¬O|4α
11991  +↓|4W|β1x|4α+↓|4W|β0 |πwhen the values |εW(0), 
11996  W(1),|4.|4.|4.|4,|4W(m|4α_↓|41) |πare given: 
11999  Let us _rst write|'{A9}|εW(x)|4α=↓|4|≤u|βm|βα_↓|β1x|gm|gα_↓|
12003  g1|4α+↓|4|≤u|βm|βα_↓|β2x|gm|gα_↓|g2|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓
12003  |4|≤u|β1x|g1|4α+↓|4|≤u|β0,|J!(13)|;{A9}|πwhere 
12005  |εx|gk|4α=↓|4x(x|4α_↓|41)|4.|4.|4.|4(x|4α_↓|4k|4α+↓|41), 
12006  |πand where the |ε|≤u|βj |πare unknown as well 
12014  as the |εW|βj. |πNow|'{A9}|εW(x|4α+↓|41)|4α_↓|4W(x)|4α=↓|4(m
12018  |4α_↓|41)|≤u|βm|βα_↓|β1x|gm|gα_↓|g2|4α+↓|4(m|4α_↓|42)|≤u|βm|
12018  βα_↓|β2x|gm|gα_↓|g3|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4|≤u|β1,|;
12019  {A9}|πand by induction we _nd that for all |εk|4|¬R|40|'
12028  {A9}|ε|(1|d2k*3|)|4|↔aW(x|4α+↓|4k)|4α_↓|4|↔a|(k|d51|)|↔sW(x|4
12028  α+↓|4k|4α_↓|41)|4α+↓|'{A4}α+↓|4|↔a|(k|d52|)|↔sW(x|4α+↓|4k|4α
12029  _↓|42)|4α_↓|4|¬O|4|¬O|4|¬O|4α+↓|4(|→α_↓1)|gkW(x)|↔s|?
12030  {A4}α=↓|4|↔a|(m|4α_↓|41|d5k|)|↔s|≤u|βm|βα_↓|β1x|gm|gα_↓|g1|g
12030  α_↓|gk|4α+↓|4|↔a|(m|4α_↓|42|d5k|)|↔s|≤u|βm|βα_↓|β2x|gm|gα_↓|
12030  g2|gα_↓|gk|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4|↔a|(k|d5k|)|↔s|≤u|βk.|
12030  J!(14)|;{A9}|πDenoting the left-hand side of 
12036  (14) in the customary way as (1/|εk*3)|4|≤-|gkW(x), 
12043  |πwe see that|'{A9}|ε|(1|d2k*3|)|4|≤-|gkW(x)|4α=↓|4|(1|d2k|)|
12046  4|↔a|(1|d2(k|4α_↓|41)*3|)|4|≤-|gk|gα_↓|g1W(x|4α+↓|41)|4α_↓|4|
12046  (1|d2(k|4α_↓|41)*3|)|4|≤-|gk|gα_↓|g1W(x)|↔s|;{A9}|πand 
12048  (1/|εk*3)|4|≤-|gkW(0)|4α=↓|4|≤u|βk. |πSo the coe∃cients 
12052  |ε|≤u|βj |πcan be evaluated using a very simple 
12060  method, illustrated here for the polynomial |εW(x) 
12067  |πin (12):|'{A9}|h|ε11111!|∂11111!|∂1111/2|4α=↓|43333!|∂1111
12069  /3|4α=↓|4444!|∂111/4|4α=↓|436|E|n|;|L!|1|1|9|1>
12071  |L|9|9|1|1294>|L|9|9|1|1304|L1382/2|4α=↓|4|9|1691>
12073  |L|L|9|11676|L|L1023/3|4α=↓|4341>|L|9|11980|L|L3428/2|4α=↓|4
12074  1714|L|L144/4|4α=↓|436|J!(15)>|L|L|9|15104|L|L1455/3|4α=↓|44
12075  85>|L|9|17084|L|L6338/2|4α=↓|43169>|L|L1142>|L18526>
12079  {A9}|π{H10L12M29}|πThe leftmost column of this 
12084  tableau is a listing of the given values of |εW(0), 
12094  W(1),|4.|4.|4.|4,|4W(4); |πthe |εk|πth succeeding 
12098  column is obtained by computing the di=erence 
12105  between successive values of the preceding column 
12112  and dividing by |εk. |πThe coe∃cients |ε|≤u|βi 
12119  |πappear at the top of the columns, so that |ε|≤u|β0|4α=↓|41
12128  0,|4|≤u|β1|4α=↓|4294,|4.|4.|4.|4,|4|≤u|β4|4α=↓|436, 
12129  |πand we have|'{A9}|h|εW(x)|4|∂α=↓|4{H12}({H10}((36(x|4α_↓|4
12132  3)|4α+↓|4341)(x|4α_↓|42)|4α+↓|4691)(x|4α_↓|41)|4α+↓|4294)x|4
12132  α+↓|410.|E|n|;| W(x)|4|Lα=↓|436x|g4|4α+↓|4341x|g3|4α+↓|4691x
12133  |g2|4α+↓|4294x|g1|4α+↓|410>{A4}|Lα=↓|4{H12}({H10}((36(x|4α_↓
12134  |43)|4α+↓|4341)(x|4α_↓|42)|4α+↓|4691)(x|4α_↓|41)|4α+↓|4294)x
12134  |4α+↓|410.|J!(16)>{A9}|πIn general, we can write|'
12140  {A9}|ε|¬O|4{H12}({H10}(|≤u|βm|βα_↓|β1(x|4α_↓|4m|4α+↓|42)|4α+
12140  ↓|4|≤u|βm|βα_↓|β2)(x|4α_↓|4m|4α+↓|43)|4α+↓|4|≤u|βm|βα_↓|β3)|
12140  4α⊗↓|'{A4}α⊗↓|4(x|4α_↓|4m|4α+↓|44)|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|
12141  4|≤u|β1{H12}){H10}x|4α+↓|4|≤u|β0,|?{A9}|πand 
12143  this formula shows how the coe∃cients |εW|βm|βα_↓|β1,|4.|4.|
12149  4.|4,|4W|β1,|4W|β0 |πcan be obtained from the 
12155  |ε|≤u'|πs:|'{A9}!36|∂!!|→α_↓1|4|¬O|436|∂!!|→α_↓1|4|¬O|4111|∂
12156  !!|→α_↓1|4|¬O|4555|∂!!11|E|;|>36|?341|?>|>|;|→α_↓3|4|¬O|436|
12163  ?>|J#>|>36|?233|?691|?>|>|;|→α_↓2|4|¬O|436|?|→α_↓2|4|¬O|4233
12174  |?|;|;(17)>|J#>|>36|?161|?225|?294|?>|>|;|→α_↓1|4|¬O|436|?
12188  |→α_↓1|4|¬O|4161|?|→α_↓1|4|¬O|4225|?>|J#>|>36|?
12194  125|?64|?69|?10|?>{A9}Here the numbers below 
12203  the horizontal lines successively show the coe∃cients 
12210  of the polynomials|'{A9}|ε{A9}|π!|9|4|1|1|1From 
12214  this tableau we have|'{A9}|εW(x)|4α=↓|436x|g4|4α+↓|4125x|g3|
12218  4α+↓|464x|g2|4α+↓|469x|4α+↓|410,|;{A9}|πso the 
12221  answer to our original problem is 1234|4|¬O|42341|4α=↓|4|εW(
12227  16), |πwhere |εW(16) |πis obtained by adding 
12234  and shifting. A generalization of this method 
12241  for obtaining coe∃cients is discussed in Section 
12248  4.6.4.|'!|9|4|1|1|1The basic Stirling number 
12253  identity,|'{A9}|εx|gn|4α=↓|4|↔A|(n|d5n|)|↔S|4x|gn|4α+↓|4|¬O|
12254  4|¬O|4|¬O|4α+↓|4|↔A|(n|d51|)|↔S|4x|g1|4α+↓|4|↔An|d50|)|↔S,|;
12255  {A9}|πEq. 1.2.6<41, shows that if the coe∃cients 
12262  of |εW(x) |πare nonnegative, so are the numbers 
12270  |ε|≤u|βj, |πand in such a case |εall of the intermediate 
12280  results in the above computation are nonnegative. 
12287  |πThis further simpli_es the Toom<Cook multiplication 
12293  algorithm, which we will now consider in detail.which 
12301  are manipulated during this algorithm:|'{A9}!|9|4|1|1|1Stack
12306   |εU,|4V|*/:|\!!|π|∂Temporary storage of |εU(j) 
12311  |πand |εV(j) |πin step C4.|'| Stack |εC|*/:|\|L|πNumbers 
12318  to be multiplied, and control codes.>| Stack 
12325  |εW|*/:|\|L|πStorage of |εW(j).>{A9}|πThese stacks 
12330  may contain either binary numbers or special 
12337  symbols called code-1, code-2, code-3, and code-4. 
12344  The algorithm also constructs an auxiliary table 
12351  of numbers |εq|βk, r|βk; |πthis table is maintained 
12359  in such a manner that it may be stored as a linear 
12371  list, and all accesses to this table are made 
12380  in a simple manner so that a single pointer (which 
12390  traverses the list, moving back and forth) may 
12398  be used to access the current table entry of 
12407  interest.|'!|9|4|1|1|1(Stack |εC |πand |εW |πin 
12413  this algorithm are used to control the recursive 
12421  mechanism of the multiplication algorithm in 
12427  a reasonably straightforward manner which is 
12433  a special case of the general procedures discussed 
12441  in Chapter 8.)|'{A3}{|1|≡C|≡1|≡.|9[Compute |εq, 
12446  r |πtables.] Set stacks |εU, V, C, |πand |εW 
12455  |πempty. Set|'{A9}!!|4|4|εk|4|¬L|41,!!q|β0|4|¬L|4q|β1|4|¬L|4
12457  16,!!r|β0|4|¬L|4r|β1|4|¬L|44,!!Q|4|¬L|44,!!R|4|¬L|42.|;
12458  {A9}|π!!|4|4Now if |εq|βk|βα_↓|β1|4α+↓|4q|βk|4|¬W|4n, 
12461  |πset|'{A9}|ε*?!!|4|4|εk|4|¬L|4k|4α+↓|41,!!Q|4|¬L|4Q|4α+↓|4R,
12462  !!R|4|¬L|4|"l{H11}|¬H{H10}|v2Q|)|"L,!!q|βk|4|¬L|42|gQ,!!r|βk
12462  |4|¬L|42|gR,|;{A9}|π!!|4|4and repeat this operation 
12467  until |εq|βk|βα_↓|β1|4α+↓|4q|βk|4|¬R|4n. (Note|*/: 
12470  |\|πThe calculation of |εR|4|¬L|4|"l{H11}|¬H{H10}|v4Q|)|"L 
12474  |πdoes not require a square root to be taken, 
12483  since we may simply set |εR|4|¬L|4R|4α+↓|41 |πif 
12490  (|εR|4α+↓|41)|g2|4|¬E|4Q |πand leave |εR |πunchanged 
12495  if (|εR|4α+↓|41)|g2|4|¬Q|4Q; |πsee exercise 2. 
12500  In this step we build the sequence|'{A9}|ε|h|ε!!|4|4q|βk|4|∂
12507  α=↓|4|∂2|g4!|∂2|g4!|∂2|g6!|∂2|g8!|∂2|g1|g0!|∂2|g1|g3!|∂2|g1|
12507  g6!|∂.|4.|4.|E|n|;| k|4|Lα=↓|4|L0|L1|L2|L3|L4|L5|L6|L.|4.|4.
12508  >{A4}| q|βk|4|Lα=↓|4|L2|g4|L2|g4|L2|g6|L2|g8|L2|g1|g0|L2|g1|
12509  g3|L2|g1|g6|L.|4.|4.>{A4}| r|βk|4|Lα=↓|4|L2|g2|L2|g2|L2|g2|L
12510  2|g2|L2|g3|L2|g3|L2|g4|L.|4.|4.>{A9}|E|'|π!!|4|4The 
12513  multiplication of 70000-bit numbers would cause 
12519  this step to terminate with |εk|4α=↓|46, |πsince 
12526  70000|4|¬W|42|g1|g3|4α+↓|42|g1|g6.)|'{A12}|9|1|≡C|≡2|≡.|9[Pu
12527  t |εu, v |πon stack.] Put code-1 on stack |εC, 
12537  |πthen place |εu |πand |εv |πonto stack |εC |πas 
12546  numbers of exactly |εq|βk|βα_↓|β1|4α+↓|4q|βk 
12550  |πbits each.|'{A3}|9|1|≡C|≡3|≡.|9[|πCheck recursion 
12554  level.] Decrease |εk |πby 1. If |εk|4α=↓|40, 
12561  |πthe top of stack |εC |πcontains two 32-bit 
12569  numbers, |εu |πand |εv; |πset |εw|4|¬L|4uv |πusing 
12576  a built-in routine for multiplying 32-bit numbers, 
12583  and go to spep C10. If |εk|4|¬Q|40, |πset |εr|4|¬L|4r|βk, 
12592  q|4|¬L|4q|βk, p|4|¬L|4q|βk{U0}{H9L11M29}|πW58320#Computer 
folio 379 galley 12
12594  Programming!(Knuth/Addision-Wesley)!F.379!Ch.4!G.12b.|'
12595  {A20}{H10L12M29}{I2.1H}|9|1|≡C|≡4|≡.|9[Break 
12596  into |εr|4α+↓|41 |πparts.] Let the number at 
12603  the top of stack |εC |πbe regarded as a list 
12613  of |εr|4α+↓|41 |πnumbers with |εq |πbits each, 
12620  (|εU|βr|4.|4.|4.|4U|β1U|β0)|β2|lq. (|πThe top 
12623  of stack |εC |πnow contains an (|εr|4α+↓|41)q|4α=↓|4(q|βk|4α
12629  +↓|4q|βk|βα+↓|β1)-|πbit number.) For |εj|4α=↓|40,|41,|4.|4.|
12632  4.|4,|42r |πcompute the |εp-|πbit numbers|'{A9}|ε!!|4|4(|¬O|
12637  4|¬O|4|¬O(U|βrj|4α+↓|4U|βr|βα_↓|β1)j|4α+↓|4|¬O|4|¬O|4|¬O|4α+
12637  ↓|4U|β1)j|4α+↓|4U|β0|4α=↓|4U(j)|;{A9}|π!!|4|4and 
12639  successively put these values onto stack |εU. 
12646  (|πThe bottom of stack |εU |πnow contains |εU(0), 
12654  |πthen comes |εU(1), |πetc., with |εU(2r) |πon 
12661  top. Note that|'{A9}|ε!!|4|4U(j)|4|¬E|4U(2r)|4|¬W|42|gq{H12}
12664  ({H10}(2r)|gr|4α+↓|4(2r)|gr|gα_↓|g1|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓
12664  |41{H12}){H10}|4|¬W|42|gq|gα+↓|g1(2r)|gr|4|¬E|42|gp,|;
12665  {A9}|π!!|4|4by exercise 3.) Then remove |εU|βr|4.|4.|4.|4U|β
12670  1U|β0 |πfrom stack |εC.|'|π!!|4|4!|9|4|1|1|1Now 
12675  the top of stack |εC |πcontains another list 
12683  of |εr|4α+↓|41 q-|πbit numbers, |εV|βr|4.|4.|4.|4V|β1V|β0, 
12688  |πand the |εp-|πbit numbers|'{A9}|ε{H10L12}!!|4|4(|¬O|4|¬O|4
12692  |¬O|4(V|βrj|4α+↓|4V|βr|βα_↓|β1)j|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4V
12692  |β1)j|4α+↓|4V|β0|4α=↓|4V(j)|;{A9}|π!!|4|4should 
12694  be put onto stack |εV |πin the same way. After 
12704  this has been done, remove |εV|βr|4.|4.|4.|4V|β1V|β0 
12710  |πfrom stack |εC.|'{A3}|π|9|1|≡C|≡5|≡,|9[Recurse.] 
12714  Successively put the following items onto stack 
12721  |εC, |πat the same time emptyping stacks |εU 
12729  |πand |εV:|'{A9}|ε|π!!|4|4code-2,|4|εV(2r),|4U(2r),|4|πcode-
12731  3,|4|εV(2r|4α_↓|41),|4U(2r|4α_↓|41),|4.|4.|4.|4,|'
12732  {A4}|πcode-3,|4|εV(1),|4U(1),|4|πcode-3,|4|εV(0),|4U(0).|?
12733  {A9}|π!!|4|4Put code-4 onto stack |εW. |πGo back 
12740  to step C3.|'{A3}|9|1|≡C|≡6|≡.|9[Save one product.] 
12746  (At this point the multiplication algorithm has 
12753  set |εW |πto one of the products |εW(j)|4α=↓|4U(j)V(j).) 
12761  |πPut |εw |πonto stack |εW. (|πThis number |εw 
12769  |πcontains 2(|εq|βk|4α+↓|4q|βk|βα_↓|β1) |πbits.) 
12772  Go back to step C3.|'{A12}{H9L11}|≡F|≡i|≡g|≡. 
12778  |≡8|≡.|9|4Toom<Cook algorithm for high-precision 
12782  multiplication.|'{A12}{H10L12M29}|9|1|≡C|≡7|≡.|9[Find 
12784  |ε|≤u'|πs.] Set |εr|4|¬L|4r|βk, q|4|¬L|4q|βk, 
12788  p|4|¬L|4q|βk|βα_↓|β1|4α+↓|4q|βk. (|πAt this point 
12792  stack |εW |πcontains|4.|4.|4.|4, |πcode-4, |εW(0), 
12797  W(1),|4.|4.|4.|4,|4W(2r) |πfrom bottom to top, 
12802  where each |εW(j) |πis a 2|εp-|πbit number.)|'
12809  !!|4|4!|9|4|1|1|1Now for |εj|4α=↓|41,|42,|43,|4.|4.|4.|4,|42
12811  r, |πperform the following loop: For |εt|4α=↓|42r, 
12818  2r|4α_↓|41, 2r|4α_↓|42,|4.|4.|4.|4,|4j |πset 
12821  |εW(t)|4|¬L|4(W(t|4α_↓|41){H12}){H10}/j. (|πHere 
12823  |εj |πmust increase and |εt |πmust decrease. 
12830  The quantity {H12}({H10}|εW(t)|4α_↓|4W(t|4α_↓|41){H12}){H10}
12832  /j |πwill always be a nonnegative integer which 
12840  _ts in 2|εp |πbits; cf. (15).)|'{A3}|9|1|≡C|≡8|≡.|9[Find 
12847  |εW'|πs.] For |εj|4α=↓|42r|4α_↓|41, 2r|4α_↓|42,|4.|4.|4.|4,|
12850  41, |πperform the following loop: For |εt|4α=↓|4j, 
12857  j|4α+↓|41,|4.|4.|4.|4, 2r|4α_↓|41 |πset |εW(t)|4|¬L|4W(t)|4α
12860  _↓|4jW(t|4α+↓|41). |πHere |εj |πmust decrease 
12865  and |εt |πmust increase. The result of this operation 
12874  will again be a nonnegative |ε2p-|πbit integer; 
12881  cf. (17).)|'{A3}|9|1|≡C|≡9|≡.|9[Set answer.] 
12885  Set |εw |πto the 2(|εq|βk|4α+↓|4q|βk|βα+↓|β1)-|πbit 
12890  integer|'{A9}|ε!!|4|4(|¬O|4|¬O|4|¬O|4(W(2r)2|gq|4α+↓|4W(2r|4
12891  α_↓|41){H12}){H10}2|gq|4α+↓|4|¬O|4|¬O|4|¬O)2|gq|4α+↓|4W(0).|
12891  ;{A9}|π!!|4|4Remove |εW(2r),|4.|4.|4.|4,|4W(0) 
12894  |πand code-4 from stack |εW.|'{A3}|π*?*?|≡C|≡1|≡0|≡.|9[Return.
12899  ] Set |εk|4|¬L|4k|4α+↓|41. |πRemove the top of 
12906  stack |εC. |πIf it is code-3, go to C6. If it 
12917  is code-2, put |εw |πonto stack |εW |πand go 
12926  to C7. And if it is code-1, terminate the algorithm 
12936  (|εw |πis the answer).|'{A12}{IC}{H10L12M29}!|9|4|1|1|1Let 
12941  us now estimate the running time, |εT(n), |πfor 
12949  Algorithm C, in terms of some things we shall 
12958  call ``cycles,'' i.e., elementary machine operations. 
12964  Step C1 takes |εO(q|βk) |πcycles, even if we 
12972  represent the number |εq|βk |πinternally as a 
12979  long string of |εq|βk |πbits followed by some 
12987  delimiter, since |εq|βk|4α+↓|4q|βk|βα_↓|β1|4α+↓|4|¬O|4|¬O|4|
12989  ¬O|4α+↓|4q|β0 |πwill be |εO(q|βk). |πStep C2 
12995  obviously takes |εO(q|βk) |πcycles.|'!|9|4|1|1|1Now 
13000  let |εt|βk |πdenote the amount of computation 
13007  required to get from step C3 to step C10 for 
13017  a particular value of |εk (after k |πhas been 
13026  decreased at the beginning of step C3). Step 
13034  C3 requires |εO(q) |πcycles at most. Step C4 
13042  involves |εr |πmultiplications of a lg(|εr|4α+↓|41)-|πbit 
13048  number by a |εp-|πbit number, and |εr |πadditions 
13056  of |εp-|πbit numbers, all repeated |ε4r|4α+↓|42 
13062  |πtimes. Thus we need a total of |εO(r|g2q|4|πln|4|εr) 
13070  |πcycles. Step C5 requires moving |ε4r|4α+↓|42 
13076  p-|πbit numbers, so it involves |εO(rq) |πcycles. 
13083  Step C6 requires |εO(q) |πcycles, and it is done 
13092  |ε2r|4α+↓|41 |πtimes per iteration. The recursion 
13098  involved when the algorithm essentially invokes 
13104  itself (by returning to step C3) requires |εt|βk|βα_↓|β1 
13112  |πcycles, 2|εr|4α+↓|41 |πtimes. Step C7 requires 
13118  |εO(r|g2) |πsubtractions of |εp-|πbit numbers 
13123  and divisions of 2|εp-|πbit by (lg|4|εr)-|πbit 
13129  numbers, so it requires |εO(r|g2q|4|πln|4|εr) 
13134  |πcycles. Similarly, step C8 requires |εO(r|g2q|4|πln|4|εr) 
13140  |πcycles. Step C9 involves |εO(rq) |πcycles, 
13146  and C10 takes hardly any time at all.|'!|9|4|1|1|1Summing 
13155  up we have, for |εq|4α=↓|4q|βk |πand |εr|4α=↓|4r|βk, 
13162  T(n)|4α=↓|4O(q|βk)|4α+↓|4O(q|βk)|4α+↓|4t|βk|βα_↓|β1, 
13163  |πwhere|'{A9}|εt|βk|4|∂α=↓|4O(q)|4α+↓|4O(r|g2q|4|πln|4|εr)|4
13164  α+↓|4O(rq)|4α+↓|4(2r|4α+↓|41)O(q)|4α+↓|4O(r|g2q|4|πln|4|εr)|
13164  ;{A4}|L|4!|1|1|1α+↓|4O(r|g2q|4|πln|4|εr)|4α+↓|4O(rq)|4α+↓|4O
13165  (q)|4α+↓|4(2r|4α+↓|41)t|βk|βα_↓|β1>{A4}|L|4α=↓|4|εO(r|g2q|4|
13166  πln|4|εr)|4α+↓|4(2r|4α+↓|41)t|βk|βα_↓|β1;>{A9}|πthus 
13168  there is a constant |εc |πsuch that|'{A9}|εt|βk|4|¬E|4cr|ur2
13175  |)k|)q|βk|4|πlg|4|εr|βk|4α+↓|4(2r|βk|4α+↓|41)t|βk|βα_↓|β1.|;
13176  {A9}|πTo complete the estimation of |εt|βk |πwe 
13183  can prove by brute force that|'{A9}|εt|βk|4|¬E|4Cq|βk|βα+↓|β
13189  12|g2|g.|g5|g|¬H|π|gl|gg|1|1|ε|gq|rk|rα⊗↓|r1|J!(18)|;
13190  {A9}|πFor some constant |εC. |πLet us choose 
13197  |εC|4|¬Q|420c, |πand let us also take |εC |πlarge 
13205  enough so that (18) is valid for |εk|4|¬E|4k|β0, 
13213  |πwhere |εk|β0 |πwill be speci_ed below. Then 
13220  when |εk|4|¬Q|4k|β0, |πlet |εQ|βk|4α=↓|4|πlg|4|εq|βk, 
13224  R|βk|4α=↓|4|πlg|4|εr|βk; |πwe have by induction|'
13229  {A9}|εt|βk|4|∂|¬E|4cq|βkr|ur2|)k|)|4|πlg|4|εr|βk|4α+↓|4(2r|β
13229  k|4α+↓|41)Cq|βk2|ur2.5|¬HQ|βk|)|)|;{A4}|L|4α=↓|4Cq|βk|βα+↓|β
13230  12|ur2.5|¬H|πlg|4|εq|βk|βα+↓|β1|)|)(|≤h|β1|4α+↓|4|≤h|β2),>
13231  {A6}|πwhere|'{A6}|ε|≤h|β1|4|∂α=↓|4|(c|d2C|)|4R|βk2|urR|βkα_↓
13232  2.5|¬HQ|βk|βα+↓|β1|)|)|4|¬W|4|(1|d220|)|4R|βk2|gα_↓|gR|rk|4|
13232  ¬W|40.05,|;{A4}| |≤h|β2|4|Lα=↓|4|↔a2|4α+↓|4|(1|d2r|βk|)|↔s|4
13233  2|ur2.5(|¬HQ|βkα_↓|¬HQ|βk|βα+↓|β1)|)|)|4|¬M|42|gα_↓|g1|g/|g4
13233  |4|¬W|40.85,>{A6}|πsince|'{A6}|ε{H10L12M29}|¬H|v2|εQ|βk|βα+↓
13235  |β1|)|4α_↓|4|¬H|v2Q|βk|)|4α=↓|4{H12}|¬H{H10}Q|βk|4α+↓|4|"l|¬
13235  H|v2Q|βk|)|"L|4α_↓|4|¬H|v2Q|βk|)|4|¬M|4|f1|d32|)|;
13236  {A9}|πas |εk|4|¬M|4|¬X. |πIt follows that we 
13242  can _nd |εk|β0 |πsuch that |ε|≤h|β2|4|¬W|40.95 
13248  |πfor all |εk|4|¬Q|4k|β0, |πand this completes 
13254  the proof of (18) by induction.|'!|9|4|1|1|1Finally, 
13261  therefore, we may compute |εT(n); |πsince |εn|4|¬Q|4q|βk|βα_
13267  ↓|β1|4α+↓|4q|βk|βα_↓|β2, |πwe have |ε|βk|βα_↓|β1|4|¬W|4n; 
13271  |πhence|'{A9}|ε|εr|βk|βα_↓|β1|4α=↓|42|ur|"l|πlg|4|εq|βk|βα_↓
13272  |β1|"L|)|)|4|¬W|42|ur|¬H|πlg|4|εn|)|),!!|πand!!|εq|βk|4α=↓|4
13272  r|βk|βα_↓|β1q|βk|βα_↓|β1|4|¬W|4n2|ur|¬H|πlg|4|εn|)|).|;
13273  {A9}|πThus|'{A9}|εt|βk|βα_↓|β1|4|¬E|4Cq|βk2|ur2.5|¬H|4|εq|βk
13274  |)|)|4|¬W|4Cn2|ur|¬H|πlg|4|εnα+↓2.5(|¬H|πlg|4|εnα+↓1)|)|),|;
13275  {A9}|πand, since |εT(n)|4α=↓|4O(qk)|4α+↓|4t|βk|βα_↓|β1, 
13278  |πwe have _nally the following theorem:|'{A12}|≡T|≡h|≡e|≡o|≡
13284  r|≡e|≡m |≡C|≡.|9|4|εThere is a constant c|β0 
13290  such that the execution time of Algorithm C is 
13299  less than c|β0n2|ur2.5|¬H|πlg|4|εn|)|) cycles.|'
13303  {A12}|π{H10L12}This result is noticeably stronger 
13308  than Theorem A, since |εn2|ur3.5|¬H|πlg|4|εn|)|)|4α=↓|4n|ur1
13312  α+↓3.5/|¬H|πlg|4|εn|)|). |πBy adding a few complications 
13318  to the algorithm, pushing the ideas to their 
13326  apparent limits (see exercise 5), we can improve 
13334  the estimated execution time to|'{A9}|εT(n)|4α=↓|4O(n2|ur|¬H
13339  2|4|πlg|4|εn|)|)|4|πlog|4|εn).|J!(19)|;{A9}|π|≡B|≡. 
13341  |≡A |≡m|≡o|≡d|≡u|≡l|≡a|≡r |≡m|≡e|≡t|≡h|≡o|≡d|≡.|9|4|πThere 
13344  is another way to multiply large nqmbers very 
13352  rapidly, based on the ideas of modular arithmetic 
13360  as presented in Section 4.3.2. It is very hard 
13369  to believe at _rst that this method can be of 
13379  advantage, since a multiplication algorithm based 
13385  on modular arithmetic must include the choice 
13392  of moduli and the conversion of numbers into 
13400  and out of modular representation, besides the 
13407  actual multiplication operation itself. In spite 
13413  of these formidable di∃culties, A. Sc{U0}{H9L11M29}|πW58320#
folio 382 galley 13
13418  Computer Programming!(Knuth/Addision-Wesley)!F.382!Ch.4!G.13
13419  b.|'{A20}{H10L12M29}!|9|4|1|1|1In order to understand 
13424  the essential mechanism of Sch|=4ohage's method, 
13430  we shall look at a special case. Consider the 
13439  sequence de_ned by the rules|'{A9}|ε|εq|β0|4α=↓|41,!!q|βk|βα
13444  +↓|β1|4α=↓|43q|βk|4α_↓|41,|J!(20)|;{A9}|π|πso 
13446  that |εq|βk|4α=↓|43|gk|4α_↓|43|gk|gα_↓|g1|4α_↓|4|¬O|4|¬O|4|¬
13447  O|4α_↓|41|4α=↓|4|f1|d32|)(3|gk|4α+↓|41). |πWe 
13449  will study a procedure that jultiplies (18|εq|βk|4α+↓|48)-|π
13455  bit numbers, in terms of a method for multiplying 
13464  |ε(18q|βk|βα_↓|β1|4α+↓|48)-|πbit numbers. Thus, 
13467  if we know how to multiply numbers having (18|εq|β0|4α+↓|48)
13475  |4α=↓|426 |πbits, the procedure to be described 
13482  will show us how to multiply numbers of (18|εq|β1|4α+↓|48)|4
13490  α=↓|444 |πbits, then 98 bits, then 260 bits, 
13498  etc., eventually increasing the number of bits 
13505  by almost a factor of 3 at each step.|'!|9|4|1|1|1Let 
13515  |εp|βk|4α=↓|418q|βk|4α+↓|48. |πWhen multiplying 
13518  |εp|βk-|πbit numbers, the idea is to use the 
13526  six moduli|'{A9}|εm|β1|4α=↓|42|g6|gq|rk|gα_↓|g1|4α_↓|41,!!m|
13528  β2|4α=↓|42|g6|gq|rk|gα+↓|g1|4α_↓|41,!!m|β3|4α=↓|42|g6|gq|rk|
13528  gα+↓|g2|4α_↓|41,|;{A4}m|β4|4α=↓|42|g6|gq|rk|gα+↓|g3|4α_↓|41,
13529  !!m|β5|4α=↓|42|g6|gq|rk|gα+↓|g5|4α_↓|41,!!m|β6|4α=↓|42|g6|gq
13529  |rk|gα+↓|g7|4α_↓|41.|J!(21)|;{A9}|πThese moduli 
13532  are relatively prime, by Eq. 4.3.2<18, since 
13539  the exponents|'{A9}|ε6q|βk|4α_↓|41,!!6q|βk|4α+↓|41,!!6q|βk|4
13541  α+↓|42,!!6q|βk|4α+↓|43,!!6q|βk|4α+↓|45,!!6q|βk|4α+↓|47|J!(22
13541  )|;{A9}|πare always relatively prime (see exercise 
13548  6). The six moduli in (21) are capable of representing 
13558  numbers up to |εm|4α=↓|4m|β1m|β2m|β3m|β4m|β5m|β6|4|¬Q|42|g3|
13561  g6|gq|rk|gα+↓|g1|g6|4α=↓|42|g2|gp|rk, |πso there 
13564  is no chance of over⊗ow in the multiplication 
13572  of |εp|βk-|πbit numbers |εu |πand |εv. |πThus 
13579  we may use the following method:|'{A12}{I1.2H}a)|9|1Compute 
13586  |εu|β1|4α=↓|4u |πmod |εm|β1,|4.|4.|4.|4,|4u|β6|4α=↓|4u 
13589  |πmod |εm|β6; v|β1|4α=↓|4v |πmod |εm|β1,|4.|4.|4.|4,|4v|β6|4
13593  α=↓|4v |πmod |εm|β6.|'|πb)|9Multiply |εu|β1 |πby 
13599  |εv|β1, u|β2 |πby |εv|β2,|4.|4.|4.|4,|4u|β6 |πby 
13604  |εv|β6. |πThese are numbers of at most |ε6q|βk|4α+↓|47|4α=↓|
13611  418q|βk|βα_↓|β1|4α+↓|41|4|¬W|4p|βk|βα_↓|β1 |πbits, 
13613  so the multiplications can be performed by using 
13621  the assumed |εp|βk|βα_↓|β1-|πbit multiplication 
13625  procedure.|'c)|9|1|1Compute |εw|β1|4α=↓|4u|β1v|β1 
13628  |πmod |εm|β1, w|β2|4α=↓|4u|β2v|β2 |πmod |εm|β2,|4.|4.|4.|4,|
13632  4w|β6|4α=↓|4u|β6v|β6 |πmod |εm|β6.|'|πd)|9Compute 
13636  |εw |πsuch that 0|4|¬E|4|εw|4|¬W|4m, w |πmod 
13642  |εm|β1|4α=↓|4w|β1,|4.|4.|4.|4,|4w |πmod |εm|β6|4α=↓|4w|β6.|'
13645  {A12}|π{IC}!|9|4|1|1|1Let |εt|βk |πbe the amount 
13650  of time needed for this process. It is not hard 
13660  to see that operation (a) takes |εO(p|βk) |πcycles, 
13668  since the determination of |εu |πmod(2|ε|g2|4α_↓|41) 
13674  |πis quite simple (like ``casting-out nines''), 
13680  as shown in Section 4.3.2. Similarly, operation 
13687  (c) takes |εO(p|βk) |πcycles. Operation (b) requires 
13694  essentially 6|εt|βk|βα_↓|β1 |πcycles. This leaves 
13699  us with operation (d), which seems to be quite 
13708  a di∃cult computation; but Sch|=4ohage has found 
13715  an ingenious way to perform step (d) in |εO(p|βk|4|πlog|4|εp
13723  |βk) |πcycles, and this is the crux of the method. 
13733  As a consequence, we have|'{A9}|εt|βk|4α=↓|46t|βk|βα_↓|β1|4α
13738  +↓|4O(p|βk|4|πlog|4|εp|βk).|;{A9}|πSince |εp|βk|4α=↓|43|gk|g
13740  α+↓|g2|4α+↓|417, |πwe can show that|'{A9}|εt|βk|4α=↓|4O(6|gk
13745  )|4α=↓|4O(p|ur1.63|)k|)).|J!(23)|;{A9}|π(See 
13747  exercise 7.)|'{A12}!|9|4|1|1|1So although this 
13752  method is more complicated than the |εO(n|π|gl|gg|1|1|g3) 
13759  |πprocedure given at the beginning of the section, 
13767  it does, in fact, lead to an execution time substantially 
13777  better than |εO(n|g2) |πfor the multiplication 
13783  of |εn-|πbit numbers. Thus we can improve on 
13791  the classical method by using either of two completely 
13800  di=erent approaches.|'!|9|4|1|1|1Let us now analyze 
13806  operation (d) above. Assume that we are given 
13814  the positive integers |εe|β1|4|¬W|4e|β2|4|¬W|4|¬O|4|¬O|4|¬O|
13817  4|¬W|4e|βr, |πrelatively prime in pairs; let|'
13823  {A9}|εm|β1|4α=↓|42|ge|r1|4α_↓|41,!!m|β2|4α=↓|42|ge|r2|4α_↓|4
13823  1,!!.|4.|4.|4,!!m|βr|4α=↓|42|ge|rr|4α_↓|41.|J!(24)|;
13824  {A9}|πWe are also given numbers |εw|β1,|4.|4.|4.|4,|4w|βr 
13830  |πsuch that |ε0|4|¬E|4w|βj|4|¬E|4m|βj. |πOur 
13834  job is |εto determine the binary representation 
13841  of the number w which satis⊂es the conditions|'
13849  {A9}0|4|¬E|4w|4|¬W|4m|β1m|β2|4.|4.|4.|4m|βr,|;
13850  {A4}w|4|"o|4w|β1!!(|πmodulo|4|εm|β1),!!.|4.|4.|4,!!w|4|"o|4w
13850  |βr!(|πmodulo|4|εm|βr).|J!(25)|;{A9}|πThe method 
13853  is based on (23) and (24) of Section 4.3.2; _rst 
13863  we compute|'{A9}|εw|ur|↔0|)j|)|4α=↓|4(|¬O|4|¬O|4|¬O|4{H12}({
13865  H10}(w|βj|4α_↓|4w|ur|↔0|)1|))c|β1|βj|4α_↓|4w|ur|↔0|)2|))c|β2
13865  |βj|4α_↓|4|¬O|4|¬O|4|¬O|4α_↓|4w|ur|↔0|)j|)|βα_↓|β1{H12}){H10
13865  }c|β(|βj|βα_↓|β1|β)|βj|4|πmod|4|εm|βj,|J!(26)|;
13866  {A9}|πfor |εj|4α=↓|42,|4.|4.|4.|4,|4r, |πwhere 
13869  |εw|ur|↔0|)1|)|4α=↓|4w|β1 |πmod |εm|β1; |πthen 
13873  we compute|'{A9}|εw|4α=↓|4{H12}({H10}|¬O|4|¬O|4|¬O|4(w|ur|↔0
13875  |)r|)m|βr|βα_↓|β1|4α+↓|4w|ur|↔0|)rα_↓1|))m|βr|βα_↓|β2|4α+↓|4
13875  |¬O|4|¬O|4|¬O|4α+↓|4w|ur|↔0|)2|){H12}){H10}m|β1|4α+↓|4w|ur|↔
13875  0|)1|).|J!(27)|;{A9}|πHere |εc|βi|βj |πis a number 
13881  such that |εc|βi|βjm|βi|4|"o|41 (|πmodulo|4|εm|βj); 
13885  |πthese numbers |εc|βi|βj |πare not given, they 
13892  must be determined from the |εe|βj'|πs.|'!|9|4|1|1|1The 
13899  calculation of (26) for all |εj |πinvolves (|ur|εr|)2|)) 
13907  |πadditions modulo |εm|βj, |πeach of which takes 
13914  |εO(e|βr) |πcycles, plus (|ur|εr|)2|)) |πmultiplications 
13919  by |εc|βi|βj, |πmodulo |εm|βj. |πThe calculation 
13925  of |εw |πby formula (27) involves |εr |πadditions 
13933  and |εr |πmultiplications by |εm|βj; |πit is 
13940  easy to multiply by |εm|βj, |πsince this is just 
13949  adding, shifting, and subtracting, so it is clear 
13957  that the evaluation of Eq. (27) takes |εO(r|g2e|βr) 
13965  |πcycles. We will soon see that each of the multiplications 
13975  by |εc|βi|βj, |πmodulo |εm|βj, |πrequires only 
13981  |εO(e|βr|4|πlog|4|εe|βr) |πcycles, and therefore 
13985  |εthe entire job of conversion can be done in 
13994  O(r|g2e|βr|4|πlog|4|εe|βr) cycles.|'|π!|9|4|1|1|1The 
13997  above observations leave us with the following 
14004  problem to solve: Given positive integers |εe|4|¬W|4f 
14011  |πand a nonnegative integer |εu|4|¬W|42|gf, |πcompute 
14017  |ε(cu)|πmod(2|ε|gf|4α_↓|41), |πwhere |εc |πis 
14021  the number such that (2|ε|ge|4α_↓|41)c|4|"o|41 
14026  (|πmodulo 2|ε|gf|4α_↓|41); |πand we must do this 
14033  in |εO(f|4|πlog|4|εf) |πcycles. The result of 
14039  exercise 4.3.2<6 gives a formula for |εc |πwhich 
14047  suggests a procedure that can be used. First 
14055  we _nd the least positive integer |εb |πsuch 
14063  that|'{A9}|εbe|4|"o|41!(|πmodulo|4|εf).|J!(28)|;
14065  {A9}|πThis can be done using Euclid's algorithm 
14072  in |εO{H12}({H10}(|πlog|4|εf)|g3{H12}){H10} |πcycles, 
14075  since Euclid's algorithm applied to |εe |πand 
14082  |εf |πrequires |εO(|πlog|4|εf) |πiterations, 
14086  and each iteration requires |εO{H12}({H10}(|πlog|4|εf)|g2{H1
14090  2}){H10} |πcycles; alternatively, we could be 
14096  very sloppy here without violating the total 
14103  time constraint, by simply trying |εb|4α=↓|41,|42,|4|πetc. 
14109  untll (28) is satis_ed, and such a process would 
14118  take |εO(f|4|πlog|4|εf) |πcycles in all. Once 
14124  |εb |πhas been found, exercise 4.3.2<6 tells 
14131  us that|'{A9}|εc|4α=↓|4c[b]|4α=↓|4|↔a|↔k|uc|)0|¬Ej|¬Wb|)2|ge
14133  |gj|↔s|πmod(2|ε|gf|4α_↓|41).|J!(29)|;{A9}|π!|9|4|1|1|1A 
14135  brute-force multiplication of |ε(cu) |πmod (2|ε|gf|4α_↓|41) 
14141  |πwould not be good enough to solve the problem, 
14150  since we do not know how to multiply general 
14159  |εf-|πbit numbers in |εO(f|4|πlog|4|εf) |πcycles. 
14164  But the special form of |εc |πprovides a clue: 
14173  The binary representation of |εc |πis composed 
14180  of bits in a regular pattern, and Eq. (29) shows 
14190  that the number |εc[2b] |πcan be obtained in 
14198  a simple way from |εc[b]. |πThis suggests that 
14206  we can rapidly multiply a number |εu |πby |εc[b] 
14215  |πif we build |εc[b]u |πup in lg|4|εb |πsteps 
14223  in a suitably clever manner, such as the following: 
14232  Let the binary notation for |εb |πbe|'{A9}|εb|4α=↓|4(b|βs|4.
14239  |4.|4.|4b|β2b|β1b|β0)|β2;|;{A9}|πwe may calculate 
14243  the sequences |εa|βk, d|βk, u|βk, v|βk |πwhich 
14250  are de_ned by the rules|'{A9}|ε|h|εv|β0|4|∂α=↓|4b|β0u,!!u|βk
14255  |4|∂α=↓|4(u|βk|βα_↓|β1|4α+↓|42|ga|rk|rα_↓|r1u|βk|βα_↓|β1)|πm
14255  od(2|ε|gf|4α_↓|41);|E|n|;| a|β0|4|Lα=↓|4e,| a|βk|4|Lα=↓|42a|
14256  βk|βα_↓|β1|4|πmod|4|εf;>{A4}| d|β0|4|Lα=↓|4b|β0e,| d|βk|4|Lα
14257  =↓|4(d|βk|βα_↓|β1|4α+↓|4b|βka|βk)|πmod|4|εf;>
14258  {A4}| u|β0|4|Lα=↓|4u,| u|βk|4|Lα=↓|4(u|βk|βα_↓|β1|4α+↓|42|ga
14258  |rk|rα_↓|r1u|βk|βα_↓|β1)|πmod(2|ε|gf|4α_↓|41);>
14259  {A4}| v|β0|4|Lα=↓|4b|β0u,| v|βk|4|Lα=↓|4(v|βk|βα_↓|β1|4α+↓|4
14259  b|βk2|gd|rk|rα_↓|r1u|βk)|πmod(2|ε|gf|4α_↓|41).|J!(30)>
14260  {A9}|πIt is easy to prove by induction on |εk 
14269  |πthat|'{A9}|h|εu|βk|4|∂α=↓|4(c[2|gk]u)|πmod(2|ε|gf|4α_↓|41)
14270  ;!!d|βk|4|∂α=↓|4{H12}({H10}c[(b|βk|4.|4.|4.|4b|β1b|β0)|β2]u{
14270  H12}){H10}|πmod(2|ε|gf|4α_↓|41).|E|n|;| a|βk|4|Lα=↓|4(2|gke)
14271  |πmod|4|εf;| d|βk|4|Lα=↓|4{H12}({H10}(b|βk|4.|4.|4.|4b|β1b|β
14271  0)|β2e{H12}){H10}|πmod|4|εf;>{A4}| u|βk|4|Lα=↓|4(c[2|gk]u)|π
14272  mod(2|ε|gf|4α_↓|41);| v|βk|4|Lα=↓|4{H12}({H10}c[(b|βk|4.|4.|
14272  4.|4b|β1b|β0)|β2]u{H12}){H10}|πmod(2|ε|gf|4α_↓|41).|J!(31)>
14273  {A9}|πHence the desired result, (|εc[b]u)|πmod(2|ε|gf|4α_↓|4
14277  1), |πis |εv|βs. |πThe calculation of |εa|βk, 
14284  d|βk, u|βk, v|βk |πfrom |εa|βk|βα_↓|β1, d|βk|βα_↓|β1, 
14290  y|βk|βα_↓|β1, v|βk|βα_↓|β1 |πtakes |εO(|πlog|4|εfHence 
14294  the desired result, (|εc[b]u)|πmod(2|ε|gf|4α_↓|41), 
14298  |πis |εv|βs. |πThe calculation of |εa|βk, d|βk, 
14305  u|βk, v|βk |πfrom |εa|βk|βα_↓|β1, d|βk|βα_↓|β1, 
14310  y|βk|βα_↓|β1, v|βk|βα_↓|β1 |πtakes |εO(|πlog|4|εf)|4α+↓|4O(|
14313  πlog|4|εf)|4α+↓|4O(f)|4α+↓|4O(f)|4α=↓|4O(f) |πcycles, 
14315  and therefore the entire calculation can be done 
14323  in |εsO(f)|4α=↓|4O(f|4|πlog|4|εf) |πcycles as 
14327  desired.|'!|9|4|1|1|1The reader will _nd it instructive 
14334  to study the ingenious method represented by 
14341  (30) and (31) very carefully. Similar techniques 
14348  are discussed in Section 4.6.3.|'!|9|4|1|1|1Sch|=4onhage's 
14354  paper [|εComputing |≡1 (1966), 182<196] |πshows 
14360  that these ideas can be extended to the multiplication 
14369  of |εn-|πbit numbers using |εr|4|¬V|42|ur|¬H2|4|πlg|4|εn|)|)
14373   |πmoduli, obtaining a method analogous to Algorithm 
14381  C. We shall not dwell on the details here, since 
14391  Algorithm C is always superior; in fact, an even 
14400  better method is next on our agenda.|'{A12}|≡C|≡. 
14408  |≡U|≡s|≡e |≡o|≡f |≡F|≡o|≡u|≡r|≡i|≡e|≡r |≡t|≡r|≡a|≡n|≡s|≡f|≡o
14411  |≡r|≡m|≡s|≡.|9|4The critical problem in high-precision 
14416  multiplication is the determination of ``convolution 
14422  products'' such as|'{A9}|εu|βrv|β0|4α+↓|4u|βr|βα_↓|β1v|β1|4α
14425  +↓|4|¬O|4|¬O|4|¬O|4α+↓|4u|β0v|βr,|;|Hβ*?*?*?{U0}{H9L11M29}|πW58
folio 385 galley 14
14426  320#Computer Programming!(Knuth/Addision-Wesley)!F.385!Ch.4.
14427  !G.14b.|'{A20}{H10L12M29}|πand there is an intimate 
14433  relation between convolutions and _nite Fourier 
14439  transforms. If |ε|≤v|4α=↓|4|πexp(2|ε|≤p|βi/K) 
14442  |πis a |εK|πth root of unity, the one-dimensional 
14450  Fourier transform of |ε(u|β0,|4u|β1,|4.|4.|4.|4,|4u|βK|βα_↓|
14453  β1) |πmay be de_ned to be (|ε|=7u|β0,|4|=7u|β1,|4.|4.|4.|4,|
14459  4|=7u|βK|βα_↓|β1), |πwhere|'{A9}|ε|=7u|βs|4α=↓|4|↔k|uc|)0|¬E
14461  t|¬WK|)|≤v|gs|gtu|βt,!!0|4|¬E|4s|4|¬W|4K.|J!(32)|;
14462  {A9}|πLetting (|ε|=7v|β0,|4|=7v|β1,|4.|4.|4.|4,|4|=7v|βK|βα_
14463  ↓|β1) |πbe de_ned in the same way, as the transform 
14473  of (|εv|β0,|4v|β1,|4.|4.|4.|4,|4v|βK|βα_↓|β1), 
14475  |πit is not di∃cult to see that (|ε|=7u|β0|=7v|β0,|4|=7u|β1|
14482  =7v|β1,|4.|4.|4.|4, |=7u|βK|βα_↓|β1|=7v|βK|βα_↓|β1) 
14484  |πis the transform of (|εw|β0,|4w|β1,|4.|4.|4.|4,|4w|βK|βα_↓
14488  |β1), |πwhere|'{A9}|εw|βr|4|∂α=↓|4u|βrv|β0|4α+↓|4u|βr|βα_↓|β
14490  1v|β1|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4u|β0v|βr|4α+↓|4u|βK|βα_↓|β1v
14490  |βr|βα+↓|β1|4α+↓|4|¬O|4|¬O|4|¬O|4α+↓|4u|βr|βα+↓|β1v|βK|βα_↓|
14490  β1|;{A4}|L|4α=↓|4|↔k|uc|)iα+↓j|"or(|πmodulo|4|εK)|)u|βiv|βj.
14491  >{A9}|πWhen |εK|4|¬R|42n |πand |εu|βn|βα+↓|β1|4α=↓|4u|βn|βα+
14495  ↓|β2|4α=↓|4|¬O|4|¬O|4|¬O|4α=↓|4u|βK|βα_↓|β1|4α=↓|4v|βn|βα+↓|
14495  β1|4α=↓|4|¬O|4|¬O|4|¬O|4α=↓|4v|βK|βα_↓|β1|4α=↓|40, 
14496  |πthe |εw'|πs are just what we need for multiplication; 
14505  |π|εthe transform of a convolution product is 
14512  the ordinary product of the transforms. |πThis 
14519  idea is a special case of Toom's use of polynomials 
14529  {H12}({H10}cf.|4(10){H12}){H10}, with |εx |πreplaced 
14533  by roots of unity.|'!|9|4|1|1|1The above property 
14540  of Fourier transforms was exploited by V. Strassen 
14548  in 1968, using a su∃ciently precise binary representation 
14556  of the complex number |ε|≤v, |πto multiply large 
14564  numbers faster than was possible under all previously 
14572  known schemes. In 1970, he and A. Sch|=4onhage 
14580  found an elegant way to modify the method, avoiding 
14589  all the complications of complex numbers and 
14596  obtaining a very pretty algorithm capable of 
14603  multiplying two |εn-|πbit numbers in |εO(n|4|πlog|4|εn|4|πlo
14608  g|4|εn) |πsteps. We shall now study their remarkable 
14616  approach [cf. |εComputing |≡7 (1971), 281<292], 
14622  |πin a simpli_ed form suggested by V. R. Pratt.|'
14631  !|9|4|1|1|1It is convenient in the _rst place 
14638  to replace |εn |πby |ε2|gn, |πand to seek a procedure 
14648  that multiplies |ε2|gn-|πbit numbers in |εO(2|gn|4n|4|πlog|4
14653  |εn) |πsteps. Roughly speaking, we shall reduce 
14660  the problem of multiplying |ε2|gn-|πbit numbers 
14666  to the problem of doing about 2|ε|gn|g/|g2 |πmultiplications
14673   of |ε2|gn|g/|g2-|πbit numbers, with |εO(2|gn|4n) 
14679  |πauxiliary steps required to piece these products 
14686  together properly; then there will be lg |εn 
14694  |πlevels of recursion with |εO(2|gn|4n) |πsteps 
14700  per level, making a total of |εO(2|gn|4n|4|πlog|4|εn) 
14707  |πsteps as desired.|'!|9|4|1|1|1|πLet |εN|4α=↓|42|gn 
14712  |πand suppose we wish to compute the product 
14720  of |εu |πand |εv, |πwhere 0|4|¬E|4|εu,|4v|4|¬W|42|gN. 
14726  |πAs in Algorithm C we shall break these |εN-|πbit 
14735  numbers into groups; let|'{A9}|εk|4α+↓|4l|4α=↓|4n|4α+↓|41,|;
14740  {A4}K|4α=↓|42|gk,!!L|4α=↓|42|gl,|;{A9}|πand write|'
14743  {A9}|εu|4α=↓|4(U|βK|β/|β2|βα_↓|β1|4.|4.|4.|4U|β1U|β0)|m2|βL,
14743  !!v|4α=↓|4(V|βK|β/|β2|βα_↓|β1|4.|4.|4.|4V|β1V|β0)|m2|βL,|;
14744  {A9}|πregarding |εu |πand |εv |πas 2|ε|gk|gα_↓|g1 
14750  |πgroups of 2|ε|gl-|πbit numbers. We will select 
14757  appropriate values for |εk |πand |εl |πlater; 
14764  it turns out (see exercise 10) that we will need 
14774  to have|'{A9}|ε4|4|¬E|4k|4|¬E|4l|4α+↓|43,|J!(33)|;
14777  {A9}|πbut no other conditions. The above representation 
14784  of |εu |πand |εv |πimplies as before that|'{A9}|εu|4|¬O|4v|4
14792  α=↓|4W|βK|βα_↓|β22|g(|gK|gα_↓|g2|g)|gL|4α+↓|4|¬O|4|¬O|4|¬O|4
14792  α+↓|4W|β12|gL|4α+↓|4W|β0,|J!(34)|;{A9}|πwhere|'
14794  {A9}|εW|βr|4α=↓|4|↔k|uc|)iα+↓jα=↓r|)U|βiV|βj|4α=↓|4|↔k|uc|)i
14794  α+↓j|"or(|πmodulo|4|εK)|)U|βiV|βj,|J!(35)|;{A9}|πif 
14796  we de_ne |εU|βi|4α=↓|4V|βj|4α=↓|40 |πfor |εi, 
14801  j|4|¬R|4K/2. |πClearly 0|4|¬E|4|εW|βr|4|¬E|4(K/2)(2|gL|4α_↓|
14803  41)|g2|4|¬W|42|g2|gL|gα+↓|gK|gα_↓|g1; |πtherefore 
14805  if we knew the |εW'|πs, we could compute |εu|4|¬O|4v 
14814  |πby adding up the terms in (34), in |εO{H12}({H10}K(2L|4α+↓
14822  |4k){H12}){H10}|4α=↓|4O(N) |πfurther steps.|'
14825  !|9|4|1|1|1Our goal is to compute the |εW|βr 
14832  |πexactly; and we can do this by computing their 
14841  value mod |εM, |πwhere |εM |πis any number larger 
14850  than (|εK/2)(2|gL|4α_↓|41)|g2. |πThe key idea 
14855  is that we can choose |εM|4α=↓|42|g4|gL|4α+↓|41, 
14861  |πand compute the |εW|βr |πby doing a ``fast 
14869  Fourier transform'' modulo |εM, |πwhere the |εK|πth 
14876  root of unity |ε|≤v |πwe use is a power of 2 
14887  (so that multiplication by powers of |ε|≤v |πis 
14895  very simple).|'!|9|4|1|1|1Before discussing this 
14900  idea in detail, a numerical example of what we've 
14909  said so far may help to _x the ideas. Suppose 
14919  that we want to multiply two 4096-bit numbers, 
14927  obtaining an 8192-bit product; thus |εn|4α=↓|412 
14933  |πin the above discussion, and we may choose 
14941  |εk|4α=↓|48, l|4α=↓|45. |πThe bits of |εu |πand 
14948  |εv |πare partitioned into 128 groups of 32 bits 
14957  each, and the basic idea is to _nd the 256 convolution 
14968  products (35) and to add them together (after 
14976  appropriate shifting). The convolution products 
14981  have at most 64|4α+↓|47|4α=↓|471 bits each, so 
14988  it surely su∃ces to determine them modulo |εM|4α=↓|42|g1|g2|
14995  g8|4α+↓|41. |πWe will see that it is possible 
15003  to _nd the convolution products rapidly by _rst 
15011  computing their _nite Fourier transform mod |εM, 
15018  |πusing the integer |ε|≤v|4α=↓|42 |πas a 256th 
15025  ``root of unity.'' These integer calculations 
15031  mod |εM |πturn out to have all the necessary 
15040  properties of complex roots of unity in the ordinary 
15049  Fourier transform (32).|'!|9|4|1|1|1Arithmetic 
15053  mod (2|ε|gm|4α+↓|41) |πis somewhat similar to 
15059  ones' complement arithmetic, mod (2|ε|gm|4α_↓|41), 
15064  |πalthough it is slightly more complicated; we 
15071  have already investigated the idea brie⊗y in 
15078  Section 3.2.1.1. Numbers can be represented as 
15085  |εm-|πbit quantities in binary notation, except 
15091  for the special value |→α_↓1|4|"o|42|ε|gm |πwhich 
15097  may be represented in some special way. Addition 
15105  mod (2|ε|gm|4α+↓|41) |πis easily done in |εα((mn 
15112  mod (2|ε|gm|4α+↓|41) |πis easily done in |εO(m) 
15119  |πcycles, since a carry o= the left end merely 
15128  means we must subtract 1 at the right; similarly, 
15137  subtraction mod (2|ε|gm|4α+↓|41) |πis quite simple. 
15143  Furthermore, we can multiply by |ε2|gr |πin |εO(m) 
15151  |πcycles, when |ε0|4|¬E|4r|4|¬W|4m, |πsince|'
15155  {A9}|ε2|gr|4|¬O|4(u|βm|βα_↓|β1|4.|4.|4.|4u|β1u|β0)|β2|4|"o|4
15155  |∂(u|βm|βα_↓|βr|βα_↓|β1|4.|4.|4.|4u|β00|4.|4.|4.|40)|β2|;
15156  {A4}|Lα_↓|4(0|4.|4.|4.|4ou|βm|βα_↓|β1|4.|4.|4.|4u|βm|βα_↓|β1
15156  |4.|4.|4.|4u|βm|βα_↓|βr)|β2,!(|πmodulo|42|ε|gm|4α+↓|41).|J!(
15156  36)>|Hβ*?{U0}{H9L11M29}|πW58320#Computer Programming!(Knuth/A
folio 388 galley 15
15158  ddision-Wesley!F.388!Ch.4!G.15b.|'{A20}!|9|4|1|1|1Given 
15160  a sequence of |εK|4α=↓|42|gk |πintegers (|εa|β0,|4.|4.|4.|4,
15165  |4a|βK|βα_↓|β1), |πand an integer |ε|≤v |πsuch 
15171  that |ε|≤v|gK|4|"o|41 (|πmodulo |εM), |πthe integer 
15177  Fourier transform|'{A9}|ε|=7a|βs|4α=↓|4|↔a|↔k|uc|)0|¬Et|¬WK|
15179  )|≤v|gs|gta|βt|↔s|πmod|4|εM,!!0|4|¬E|4s|4|¬W|4K|J!(37)|;
15180  {A9}|πcan be calculated rapidly as follows. (In 
15187  these formulas the |εs|βj |πand |εt|βj |πare 
15194  either 0 or 1, so that each step represents 2|ε|gk 
15204  |πcomputations.)|'{A12}!|9|4|1|1|1Step 0.|9|4Let 
15207  |εA|g[|g0|g](t|βk|βα_↓|β1,|4.|4.|4.|4,|4t|β0)|4α=↓|4a|βt,!!|
15207  πwhere!!|εt|4α=↓|4(t|βk|βα_↓|β1|4.|4.|4.|4t|β0)|β2.|'
15208  {A12}|π!|9|4|1|1|1Step 1.|9|4Set |εA|g[|g1|g](s|βk|βα_↓|β1,|
15210  4t|βk|βα_↓|β2,|4.|4.|4.|4,|4t|β0)|'{A9}|εα=↓|4{H12}({H10}A|g
15211  [|g0|g](0,|4t|βk|βα_↓|β2,|4.|4.|4.|4,|4t|β0)|4α+↓|4|≤v|ur(s|
15211  βk|βα_↓|β10.0|4.|4.|40)|β2|)|)|4|¬O|4A|g[|g0|g](1,|4t|βk|βα_
15211  ↓|β2,|4.|4.|4.|4,|4t|β0){H12}){H10}|πmod|4|εM.|;
15212  {A9}|π!|9|4|1|1|1Step 2.|9|4Set |εA|g[|g2|g](s|βk|βα_↓|β1,|4
15214  s|βk|βα_↓|β2,|4t|βk|βα_↓|β3,|4.|4.|4.|4,|4t|β0)|'
15215  {A9}|εα=↓|4{H12}({H10}A|g[|g1|g](s|βk|βα_↓|β1,|40,|4t|βk|βα_
15215  ↓|β3,|4.|4.|4.|4,|4t|β0)|4α+↓|4|≤v|ur(s|βk|βα_↓|β2s|βk|βα_↓|
15215  β10|4.|4.|4.|40)|β2|)|)|4|¬O|4A|g[|g1|g](s|βk|βα_↓|β1,|41,|4
15215  t|βk|βα_↓|β3,|4.|4.|4.|4,|4t|β0){H12}){H10}|πmod|4|εM.|;
15216  {A9}|π!|9|4|1|1|1Step |εk.|9|4|πSet |εA|g[|gk|g](s|βk|βα_↓|β
15218  1,|4.|4.|4.|4,|4s|β1,|4s|β0)|'{A9}|εα=↓|4{H12}({H10}A|g[|gk|
15219  gα_↓|g1|g](s|βk|βα_↓|β1,|4.|4.|4.|4,|4s|β1,|40)|4α+↓|4|≤v|ur
15219  (s|β0s|β1|4.|4.|4.|4s|βk|βα_↓|β1)|β2|)|)|4|¬O|4A|g[|gk|gα_↓|
15219  g1|g](s|βk|βα_↓|β1,|4.|4.|4.|4,|4s|β1,|41){H12}){H10}|πmod|4
15219  |εM.|;{A9}|πIt is not di∃cult to prove by induction 
15228  that we have|'|εA|g[|gj|g](s|βk|βα_↓|β1,|4.|4.|4.|4,|4s|βk|β
15231  α_↓|βj, t|βk|βα_↓|βj|βα_↓|β1,|4.|4.|4.|4,|4t|β0)|'
15233  {A9}|εα=↓|↔k|uc|)0|¬Et|βk|βα_↓|β1,|4.|4.|4.|4,|4t|βk|βα_↓|βj
15233  |¬E1|)|≤v|ur(s|β0s|β1|4.|4.|4.|4s|βk|βα_↓|β1)|β2|4|¬O|4(t|βk
15233  |βα_↓|β1|4.|4.|4.|4t|βk|βα_↓|βj0|4.|4.|4.|40)|β2|)|)|4a|βt|4
15233  |πmod|4|εM,|;{A6}|πso that|'{A9}|εA|g[|gk|g](s|βk|βα_↓|β1,|4
15236  .|4.|4.|4,|4s|β1,|4s|β0)|4α=↓|4|=7a|β2,!!|πwhere!!|εs|4α=↓|4
15236  (s|β0s|β1|4.|4.|4.|4s|βk|βα_↓|β1)|β2.|;{A9}|π(Note 
15238  the reversed order of the binary digits in |εs. 
15247  |πFor further discussion of transforms such as 
15254  this, see Section 4.6.4.)|'{H10L12}|π!|9|4|1|1|1Now 
15259  we have enough machinery at our disposal to do 
15268  the calculation of all |εW|βr |πas promised. 
15275  Let |ε|≤v|4α=↓|42|r2|gl|gα+↓|g3|gα_↓|gk, |πso 
15278  that |ε|≤v|gK|4α=↓|42|g8|gL|4|"o|41 (|πmodulo|4|εM), 
15281  |πwhere |εM|4α=↓|42|g4|gL|4α+↓|41. |πThe integer 
15285  fast Fourier transform algorithm above can be 
15292  applied to (|εU|ε|β0,|4.|4.|4.|4,|4U|βK|βα_↓|β1) 
15295  |πto obtain (|ε|=#U|β0,|4.|4.|4.|4,|4|=#U|βK|βα_↓|β1); 
15298  |πeach of the |εk |πsteps involves 2|ε|gk |πcomputations 
15306  of the form |εc|4α=↓|4(a|4α+↓|42|geb) |πmod |εM, 
15312  |πso the running time is |εO(k2|gkL)|4α=↓|4O(kN). 
15318  |πSimilarly we obtain (|ε|=#V|β0,|4.|4.|4.|4,|4|=#V|βK|βα_↓|
15321  β1) |πin |εO(kN) |πsteps. The next step is to 
15330  compute|'{A9}|ε(a|β0,|4a|β1,|4.|4.|4.|4,|4a|βK|βα_↓|β1)|4α=↓
15331  |4(U|β0V|β0,|4U|β1V|β1,|4.|4.|4.|4,|4U|βK|βα_↓|β1V|βK|βα_↓|β
15331  1)|πmod|4|εM,|;{A9}|πusing a high-speed multiplication 
15336  procedure for each of these products, obtaining 
15343  the results mod |εM |πby subtracting the most 
15351  signi_cant halves from the least signi_cant halves. 
15358  If we now use the fast Fourier transform a third 
15368  time, obtaining (|ε|=7a|β0,|4|=7a|β1,|4.|4.|4.|4,|4|=7a|βK|β
15370  α_↓|β1), |πthis is enough to determine |ε(W|β0,|4W|β1,|4.|4.
15376  |4.|4,|4W|βK|βα_↓|β1) |πwithout much more work, 
15381  since we shall prove that|'{A9}|ε2|gkW|βr|4|"o|4|=7a|βr!!(|π
15386  modulo|4|εM).|J!(38)|;{A9}|πThis congruence means 
15390  that an appropriate shifting operation, namely 
15396  to multiply |ε|→α_↓|=7a|βr |πby 2|ε|g4|gL|gα_↓|gk 
15401  |πmod |εM |πas in (36), _nally yields |εW|βr.|'
15409  |π!|9|4|1|1|1All this may seem like magic, but 
15416  it works; a careful study of the above remarks 
15425  will show that the method is very clever but 
15434  not a complete mystery. The proof of (38) relies 
15443  primarily on the fact that |ε|≤v|gK|g/|g2|4|"o|4|→α_↓1 
15449  (|πmodulo|4|εM), |πbecause this fact can be used 
15456  to prove that|'{A9}|ε|↔k|uc|)0|¬Et|¬WK|)|≤v|gs|gt|4α=↓|4|↔A|
15459  (K,!!|πif!!|εs|4|πmod|4|εK|4α=↓|40;|d50,|1|1|1,!!|πif!!|εs|4
15459  |πmod|4|εK|4|=|↔6α=↓|40.|)|J!(39)|;{A9}|πFor 
15461  when |εs |πmod |εK|4|=|↔6α=↓|40, |πlet |εs |πmod 
15468  |εK|4α=↓|42|gpq |πwhere |εq |πis odd and |ε0|4|¬E|4p|4|¬W|4k
15474  . |πSetting |εT|4α=↓|42|gk|gα_↓|g1|gα_↓|gp, |πwe 
15478  have |ε|≤v|gs|gT|4|"o|4|≤v|gq|gK|g/|g2|4|"o|4|→α_↓1, 
15480  |πhence |ε|≤v|g2|gs|gT|4|"o|4|→α+↓1 |πand|'{A9}|ε|h|εa|βs|4|
15483  ∂|"o|4|β0|β|¬E|βi|β,|βj|β|¬Q|βKU|βiV|βj|β0|β|¬E|βt|β|¬W|βK|≤
15483  v|g(|gs|gα+↓|gi|gα+↓|gj|g)|gt|4|"o|4K|βi|βα+↓|βj|βα+↓|βs|β|"
15483  o|β0|1|1|π|β(|βm|βo|βd|βu|βl|βo|1|1|βK|β)|εU|βiV|βj.|E|n|;
15484  | |=7a|βs|4|L|"o|4|↔k|uc|)0|¬Et|¬WK|)|≤v|gs|gt|=#U|βt|=#V|βt
15484  |4|"o|4|↔k|uc|)0|¬Et,i,j|¬WK|)|≤v|gs|gt|≤v|gt|giU|βi|≤v|gt|g
15484  iV|βj>{A4}|L|4|"o|4|↔k|uc|)0|¬Ei,j|¬WK|)U|βiV|βj|↔k|uc|)0|¬E
15485  t|¬WK|)|≤v|ur(sα+↓iα+↓j)t|)|)|4|"o|4K|↔k|uc|)0|¬Ei,j|¬WK|diα
15485  +↓jα+↓s|"o0|4(|πmodulo|4|εK)|)U|βiV|βj.>{A9}|π{H10L12}!|9|4|
15486  1|1|1The multiplication procedure is nearly complete; 
15492  it remains for us to specify |εk |πand |εl, |πand 
15502  to total up the amount of work involved. Let 
15511  |εM(n) |πdenote the time it takes to multiply 
15519  |ε2|gn-|πbit numbers by the above method, and 
15526  let |εM|¬S(n)|4α=↓|4M(n)/2|gn. |πThe calculation 
15530  time involves |εO(kN) |πsteps for the three Fourier 
15538  transforms and the other operations of negligible 
15545  cost, plus 2|ε|gk |πmultiplications of integers 
15551  in the interval [0,|42|g4|ε|gL], |πhence we have|'
15558  {A9}|εM(n)|4α=↓|42|gkM(l|4α+↓|42)|4α+↓|4O(kN);!!M|¬S(n)|4α=↓
15558  |42M|¬S(l|4α+↓|42)|4α+↓|4O(k).|;{A9}|πWe get 
15561  the best reduction of |εM|¬S(n) |πwhen |εl |πis 
15569  chosen to be as low as possible, consistent with 
15578  (33), so we set|'{A9}|εk|4α=↓|4|"ln/2|"L|4α+↓|42,!!l|4α=↓|4|
15582  "pn/2|"P|4α_↓|41.|J!(40)|;{A9}|π!|9|4|1|1|1We 
15584  have proved that there is a constant |εC |πsuch 
15593  that|'{A9}|εM|¬S(n)|4|¬E|42M|¬S(|"p(n|4α_↓|42)/2|"P|4α+↓|42)
15594  |4α+↓|4Cn,!!|πfor|4all!!|εn|4|¬R|44.|;{A9}|πIterating 
15596  this relation (cf. exercise 1.2.4<35) yields|'
15602  {A9}|εM|¬S(n)|4|¬E|42|gjM|¬S(|"p(n|4α_↓|42)/2|gj|"P|4α+↓|42)
15602  |4α+↓|4C(2|gj|gα_↓|g1j|"p(n|4α_↓|42)/2|gj|gα_↓|g1|"P|4α+↓|42
15602  |gj|gα+↓|g1|4α_↓|42),|;{A9}|πfor |εj|4α=↓|41,|42,|4.|4.|4.|4
15604  ,|4|"p|πlg|4(|εn|4α_↓|42)|"P; |πand |εj|4α=↓|4|"p|πlg|4(|εn|
15606  4α_↓|42)|"P |πyields |εM|¬S(n)|4α=↓|4O(n|4|πlog|4|εn). 
15609  |πWe have proved the main result of this section:|'
15618  {A12}|≡T|≡h|≡e|≡o|≡r|≡e|≡m |≡S (A. Sch|=4onhage, 
15622  V. Strassen).|9|4|εIt is possible to multiply 
15628  two n-bit numbers in O(n|4|πlog|4|εn|4|πlog|4log|4|εn) 
15633  steps.|'{A12}|π!|9|4|1|1|1Our formulation of 
15637  the multiplication procedure was designed primarily 
15643  for simplicity of exposition, it does not turn 
15651  out to be an especially fast method for small 
15660  |εn; |πfor example, a lot of the bits which the 
15670  above method deals with are known to be zero. 
15679  Thus the algorithm needs to be re_ned somewhat 
15687  if it is ever to become competitive with Algorithm 
15696  C when |εn |πis in a practical range. As |εn|4|¬M|4|¬X, 
15706  |πof course, fast Fourier multiplication becomes 
15712  vastly superior to Algorithm C. John Pollard 
15719  has presented a fast Fourier multiplication algorithm 
15726  which is useful for moderately large |εn, |πin 
15734  |εMath. Comp. |≡2|≡5 (1971), 365<374.|'!|9|4|1|1|1|πThe 
15740  word ``steps'' in Theorem S has been used somewhat 
15749  loosely; we have implicitly been assuming a ``conventional 
15757  computer'' with unlimited random-access memory, 
15762  which takes one unit of time to read and write 
15772  any bit. This assumption is quite unrealistic 
15779  as |εn|4|¬M|4|¬X, |πsince we need |εO(|πlog|4|εn) 
15785  |πbits in an instruction or an index register 
15793  just to be able to distinguish between |εn |πmemory 
15802  cellqs,*?h takes one unit of time to read and 
15811  write any bit. This assumption is quite unrealistic 
15819  as |εn|4|¬M|4|¬X, |πsince we need |εO(|πlog|4|εn) 
15825  |πbits in an instruction or an index register 
15833  just to be able to distinguish between |εn |πmemory 
15842  cells, so the actual time to access memory on 
15851  a ``conventional computer'' is really proportional 
15857  to log |εn. |πWe often gorget this dependence 
15865  on |εn |πbecause it does not occur on real machines 
15875  with bounded memory and bounded register size. 
15882  When |εn |πbecomes really large the only physically 
15890  appropriate model seems to be a _nite memory 
15898  with a _nite number of arbitrarily long tapes; 
15906  the fast Fourier {U0}{H9L11M29}|πW58320#Computer 
folio 392 galley 16
15910  programming!(Knuth/Addision-Wesley)!F.392!Ch.4!G.16b.|'
15911  {A20}{H10L12M29}!|9|4|1|1|1The di=erence between 
15914  these computer models can be clari_ed by considering 
15922  another method due to Sch|=4onhage and Strassen: 
15929  If |εn|4α=↓|42|gm|4|¬O|4m, |πso that |εm|4|¬V|4|πlg|4|εn 
15934  |πand |ε2|gm|4|¬V|4n/|πlg|4|εn, |πit is possible 
15939  to use the fast Fourier transform over the complex 
15948  numbers to compute the product of two |εn-|πbit 
15956  numbers by doing |εO(m|4|¬O|42|gm) |πmultiplications 
15961  of |ε6m-|πbit numbers. Each of the latter can 
15969  be broken into 12|g2|4α=↓|4144 multiplications 
15974  of (|f1|d32|)|εm)-|πbit numbers. Now we can construct 
15981  a multiplication table containing all products 
15987  |εxy |πwith |ε0|4|¬E|4x,|4y|4|¬W|42|g(|g1|g/|g2|g)|gm, 
15990  |πby repeated addition, in |εO(m|4|¬O|42|g(|g1|g/|g2|g)|gm|4
15994  |¬O|42|g(|g1|g/|g2|g)|gm) |πsteps; then each 
15998  of the |εO(m|4|¬O|42|gm) |πneeded products can 
16004  be done by table lookup in |εO(m) |πsteps. The 
16013  total number of steps for |εthis |πprocedure 
16020  therefore comes to |εO(m|g22|gm)|4α=↓|4O(n|4|πlog|4|εn); 
16024  |πwe have gotten rid of the factor log log |εn 
16034  |πin Theorem S, but the method really |εrequires 
16042  |πan unbounded random-access memory since the 
16048  table lookup cannot be done e∃ciently with a 
16056  _nite number of tapes. (Of course, a factor of 
16065  log log |εn |πis utterly negligible in practice; 
16073  when |εn |πchanges from 10|g9 |πto 10|g1|g8, 
16080  lg|4lg|4|εn |πincreases by only one.)|'!|9|4|1|1|1Perhaps 
16086  |εO(n|4|πlog|4|εn|4|πlog|4|εn) |πwill turn out 
16090  to be the fastest achievalbe multiplication speed, 
16097  on the tape model, and |εO(n|4|πlog|4|εn) |πon 
16104  the unlimited random-access model; no such result 
16111  has yet been proved. The best lower bound known 
16120  to date is a rather deep theorem proved by Michael 
16130  S. Paterson, Michael J. Fischer, and Albert R. 
16138  Meyer [|εSIAM-AMS Proceedings |≡7 (1974), 97<111], 
16144  |πbased on techniques originally introduced by 
16150  S. A. Cook and S. Aanderaa, that under certain 
16159  restrictions there is no algorithm which multiplies 
16166  |εn-|πbit numbers with an average of less than|'
16174  {A9}|εO(n|4|πlog|4|εn/|πlog|4log|4|εn)|J!(41)|;
16175  {A9}|πoperations. The restrictions under which 
16180  (41) is a lower bound are rather severe: (a) 
16189  The |ε(k|4α+↓|41)|πst input symbols of the operands, 
16196  from right to left, must not be read by the algorithm 
16207  until after the |εk|πth output symbol has been 
16215  produced; and (b) the internal tables kept by 
16223  the algorithm must have a ``uniform'' structure↔,*?*?*?*?``unifo
16229  rm'' structure, in an appropriate sense. The 
16236  latter restriction rules out algorithms which 
16242  use general List structures for their internal 
16249  tables, and the _rst restriction rules out both 
16257  Algorithm C and Algorithm S. It is still conceivable 
16266  (though unlikely) that an algorithm which violates 
16273  (a) or (b) could multiply |εn-|πbit numbers in 
16281  |εO(n) |πcycles. M. J. Fischer and L. J. Stockmeyer 
16290  have shown [|εJ. Computer and System Sciences 
16297  |≡9 (1974), 317<331] |πthat multiplication under 
16303  restrictions (a) and (b) is possible in |εO(n(|πlog|4|εn)|g2
16310  |4|πlog|4log|4|εn) |πsteps.|'{A12}|≡D|≡. |≡D|≡i|≡v|≡i|≡s|≡i|
16313  ≡o|≡n|≡.|9|4Using a fast multiplication routine, 
16318  we can now show that division can also be accomplished 
16328  in |εO(n|4|πlog|4|εn|4|πlog|4log|4|εn) |πcycles, 
16331  for some constant |ε|≤a.|'|π!|9|4|1|1|1To divide 
16337  an |εn-|πbit number |εu |πby an |εn-|πbit number 
16345  |εv, |πwe may _rst _nd an |εn-|πbit approximation 
16353  to 1/|εv, |πthen multiply by |εu |πto get an 
16362  approximation |ε|=7q |πto |εu/v, |πand, _nally, 
16368  we can make the slight correction necessary to 
16376  |ε|=7q |πto ensure that |ε0|4|¬E|4u|4α_↓|4qv|4|¬W|4v 
16381  |πby using another multiplication. From this 
16387  reasoning, we see that it su∃ces to have an algorithm 
16397  which approximates the reciprocal of an |εn-|πbit 
16404  number, in |εO(n|4|πlog|4|εn|4|πlog|4log|4|εn) 
16407  |πcycles. The following algorithm achieves this, 
16413  using ``Newton's method'' as explained at the 
16420  end of Section 4.3.1:|'{A12}|≡A|≡l|≡g|≡o|≡r|≡i|≡t|≡h|≡m 
16425  |≡R (|εHigh-precision reciprocal).|9|4|πLet |εv 
16429  |πhave the binary representation |εv|4α=↓|4(0.v|β1v|β2v|β3|4
16433  .|4.|4.)|β2, |πwhere |εv|β1|4α=↓|41. |πThis algorithm 
16438  computes an approximation |εz |πto 1/|εv, |πwhich 
16445  satis_es|'{A9}|ε|¬Gz|4α_↓|41/v|¬G|4|¬E|42|gα_↓|gn.|;
16447  {A9}|π{I1.9H}|≡R|≡1|≡.|9[Initial approximation.] 
16449  Set |εz|4|¬L|4|f1|d34|)|"l32/(4v|β1|4α+↓|42v|β2|4α+↓|4v|β3)|
16450  "L, k|4|¬L|40.|'{A3}|π|≡R|≡2|≡.|9[Newtonian iteration.] 
16454  (At this point we have a number |εz |πof the 
16464  binary form |εxx.xx|4.|4.|4.|4x |πwith |ε2|gk|4α+↓|41 
16469  |πplaces after the radix point, and |εz|4|¬E|42.) 
16476  |πCalculate |εz|g2|4α=↓|4xxx.xx|4.|4.|4.|4x |πexactly, 
16479  using a high-speed multiplication routine. Then 
16485  calculate |εV|βkz|g2 |πexactly, where |εV|βk|4α=↓|4(0.v|β1v|
16489  β2|4.|4.|4.|4v|β2|lk|lα+↓|l1|βα+↓|β3)|β2. |πThen 
16491  set |εz|4|¬L|42z|4α_↓|4V|βkz|g2|4α+↓|4r, |πwhere 
16494  0|4|¬E|4|εr|4|¬W|42|gα_↓|g2|ik|iα+↓|i1|gα_↓|g1 
16495  |πis added if necessary to ``round up'' |εz |πso 
16504  it is a multiple of 2|gα_↓|g2|ε|ik|iα+↓|i1|gα_↓|g1. 
16510  |πFinally, set |εk|4|¬L|4k|4α+↓|41.|'{A3}|π|≡R|≡3|≡.|9[Test 
16514  for end.] If |ε2|gk|4|¬W|4n, |πgo back to step 
16522  R2; otherwise the algorithm terminates.|'{IC}{A12}!|9|4|1|1|
16527  1This algorithm is a modi_cation of a method 
16535  suggested by S. A. Cook. A similar technique 
16543  has been used in computer hardware [see Anderson, 
16551  Earle, Goldschmidt, and Powers, |εIBM J. Res. 
16558  Dev. |≡1|≡1 (1967), 48<52]. |πOf course, it is 
16566  necessary to check the accuracy of Algorithm 
16573  R quite carefully, because it comes very close 
16581  to being inaccurate. We will prove by induction 
16589  that|'{A9}|εz|4|¬E|42!!|πand!!|ε|¬Gz|4α_↓|41/v|¬G|4|¬E|42|gα
16590  _↓|g2|ik|J!(42)|;{A9}|πat the beginning and end 
16596  of step R2.|'!|9|4|1|1|1For this purpose, let 
16603  |ε|≤d|βk|4α=↓|41/v|4α_↓|4z|βk, |πwhere |εz|βk 
16606  |πis the value of |εz |πafxte*?*? |εz|βk |πis the 
16615  value of |εz |πafter |εk |πiterations of step 
16623  R2. To start the induction on |εk, |πwe have 
16632  |ε|≤d|β0|4α=↓|41/v|4α_↓|48/v|¬S|4α+↓|4(32/v|¬S|4α_↓|4|"l32/v
16632  |¬S|"L)4|4α=↓|4|≤h|β1|4α+↓|4|≤h|β2, |πwhere |εv|¬S|4α=↓|4(v|
16634  β1v|β2v|β3)|β2, |≤h|β1|4α=↓|4(v|¬S|4α_↓|48v)/vv|¬S 
16636  |πsatis_es |→α_↓|f1|d32|)|4|¬W|4|ε|≤h|β1|4|¬E|40, 
16638  |πand 0|4|¬E|4|ε|≤h|β2|4|¬W|4|f1|d34|). |πHence 
16641  |ε|¬G|≤d|β0|¬G|4|¬W|4|f1|d32|). |πNow suppose 
16644  (42) has been veri_ed for |εk; |πthen|'{A9}|ε|h|ε|≤d|βk|βα+↓
16651  |β1|4α=↓|41/v|4α_↓|4z|βk|βα+↓|β1|4|∂α=↓|4|≤d|βk|4α_↓|4(1/v|4
16651  α_↓|4|≤d|βk)v|≤d|βk|4α_↓|4z|βk(v|4α_↓|4V|βk)|4α_↓|4r|E|n|;
16652  | |≤d|βk|βα+↓|β1|4α=↓|41/v|4α_↓|4z|βk|βα+↓|β1|4|Lα=↓|41/v|4α
16652  _↓|4z|βk|4α_↓|4z|βk(1|4α_↓|4z|βkV|βk)|4α_↓|4r>
16653  {A4}|L|4α=↓|4|≤d|βk|4α_↓|4z|βk(1|4α_↓|4z|βkv)|4α_↓|4z|ur2|)k
16653  |)(v|4α_↓|4V|βk)|4α_↓|4r>{A4}|L|4α=↓|4|≤d|βk|4α_↓|4(1/v|4α_↓
16654  |4|≤d|βk)v|≤d|βk|4α_↓|4z|ur2|)k|)(v|4α_↓|4V|βk)|4α_↓|4r>
16655  {A4}|L|4α=↓|4v|≤d|ur2|)k|)|4α_↓|4z|ur2|)k|)(v|4α_↓|4V|βk)|4α
16655  _↓|4r.>{A9}|πNow|'{A8}|ε0|4|¬E|4v|≤d|ur2|)k|)|4|¬W|4|≤d|ur2|
16657  )k|)|4|¬E|4(2|gα_↓|g2|ik)|g2|4α=↓|42|gα_↓|g2|ik|iα+↓|i1,|;
16658  {A6}|πand|'{A6}|ε0|4|¬E|4z|g2(v|4α_↓|4V|βk)|4α+↓|4r|4|¬W|44(
16659  2|gα_↓|g2|ik|iα⊗↓|i1|gα_↓|g3)|4α+↓|42|gα_↓|g2|ik|iα+↓|i1|gα_
16659  ↓|g1|4α=↓|42|gα_↓|g2|ik|iα+↓|i1,|;{A9}|πso |¬G|ε|≤d|βk|βα+↓|
16661  β1|¬G|4|¬E|42|gα_↓|g2|ik|iα+↓|i1. |πWe must still 
16665  verify the _rst inequality of (42); to show that 
16674  |εz|βk|βα+↓|β1|4|¬E|42, |πthere are three cases: 
16679  (a) |εV|βk|4α=↓|4|f1|d32|); |πthen |εz|βk|βα+↓|β1|4α=↓|42. 
16683  |π(b) |εV|βk|4|=|↔6α=↓|4|f1|d32|)|4α=↓|4V|βk|βα_↓|β1; 
16685  |πthen |εz|βk|4α=↓|42, |πso |ε2z|βk|4α_↓|4z|ur2|)k|)V|βk|4|¬
16688  E|42|4α_↓|42|gα_↓|g2|ik|iα+↓|i1|gα_↓|g1. |π(c) 
16690  |εV|βk|βα_↓|β1|4|=|↔6α=↓|4|f1|d32|); |πthen |εz|βk|βα+↓|β1|4
16692  α=↓|41/v|4α_↓|4|≤d|βk|βα+↓|β1|4|¬W|42|4α_↓|42|gα_↓|g2|ik|gα_
16692  ↓|g2|4α↓|42|gα_↓|g2|ik|iα+↓|i1|4|¬E|42, |πsince 
16694  |εk|4|¬|40.|'|π!|9|4|1|1|1The running time of 
16699  Algorithm R is bounded by|'{A9}|ε2T(4n)|4α+↓|42T(2n)|4α+↓|42
16704  T(n)|4α+↓|42T(|f1|d32|)n)|4|¬O|4|¬O|4es, where 
16706  |εT(n) |πis an upper bound on the time needed 
16715  to do a multiplication of |εn-|πbit numbers. 
16722  When |εT(n)|4α=↓|4C n |πlog |εn |πlog log |εn, 
16730  |πwe have |εT(4n)|4α+↓|4T(2n)|4α+↓|4T(n)|4α+↓|4|¬O|4|¬O|4|¬O
16732  |4|¬W|4T(8n), |πso division can be done with 
16739  a speed comparable to that of multiplication 
16746  except for a constant factor.|'{A12}|≡E|≡. |≡A|≡n 
16753  |≡e|≡v|≡e|≡n |≡f|≡a|≡s|≡t|≡e|≡r |≡m|≡u|≡l|≡t|≡i|≡p|≡l|≡i|≡c|
16755  ≡a|≡t|≡i|≡o|≡n |≡m|≡e|≡t|≡h|≡o|≡d|≡.|9|4It is 
16758  natural to wonder if multiplication of |εn-|πbit 
16765  numbers can actually be accomplished in just 
16772  |εn |πsteps; we have come from |εn|g2 |πdown 
16780  to |εn|g1|gα+↓|g|≤e, |πso perhaps we can squeeze 
16787  the time down even more. This is still an unsolved 
16797  problem, as pointed out above, but it is interesting 
16806  to note that the best possible time, exactly 
16814  |εn |πcycles, |εcan |πbe achieved if we leave 
16822  the domain of conventional computer programming 
16828  and allow ourselves to build a computer which 
16836  has an unlimited number of components all acting 
16844  at once.|'!|9|4|1|1|1A |εlinear iterative array 
16850  |πof automata is a set of devices |εM|β1, M|β2, 
16859  M|β3,|4.|4.|4. |πwhich can each be in a _nite 
16867  set of ``states,'' at each step of the computation. 
16876  The machines |εM|β2, M|β3,|4.|4.|4. |πall have 
16882  |εidentical |πcircuitry, and their state at time 
16889  |εt|4α+↓|41 |πis a function of their own state 
16897  at time |εt |πas well as the states of their 
16907  left and right neighbors at time |εt. |πThe _rst 
16916  machine |εM|β1 |πis slightly di=erent: its state 
16923  at time |εt|4α+↓|41 |πis a function of its own 
16932  state and that of |εM|β2, |πat time |εt, |πand 
16941  also of the |εinput |πat time |εt. |πThe |εoutput 
16950  |πof a linear iterative array is a function de_ned 
16959  on the states of |εM|β1.|'|π!|9|4|1|1|1Let |εu|4α=↓|4(u|βn|β
16965  α_↓|β1|4.|4.|4.|4u|β1u|β0)|β2, v|4α=↓|4(v|βn|βα_↓|β1|4.|4.|4
16966  .|4v|β1v|β0)|β2, |πand |εq|4α=↓|4(q|βn|βα_↓|β1|4.|4.|4.|4q|β
16968  1q|β0)|β2 |πbe binary numbers, and let |εuv|4α+↓|4q|4α=↓|4w|
16974  4α=↓|4(w|β2|βn|βα_↓|β1|4.|4.|4.|4w|β1w|β0)|β2. 
16975  |πIt is remarkable fact that a linear iterative 
16983  array can be constructed, independent of |εn, 
16990  |πwhich will output |εw|β0, w|β1, w|β2,|4.|4.|4. 
16996  |πat times 1, 2, 3,|4.|4.|4.|4, if it is given 
17005  the inputs |ε(u|β0,|4v|β0,|4q|β0), (u|β1,|4v|β1,|4q|β1), 
17009  (u|β2,|4v|β2,|4q|β2)),*?*?*?*?output |εw|β0, w|β1, 
17012  w|β2,|4.|4.|4. |πat times 1, 2, 3,|4.|4.|4.|4, 
17018  if it is given the inputs |ε(u|β0,|4v|β0,|4q|β0), 
17025  (u|β1,|4v|β1,|4q|β1), (u|β2,|4v|β2,|4q|β2),|4.|4.|4. 
17027  |πat times 0,|41,|42,|4.|4.|4.|4.|'!|9|4|1|1|1We 
17031  can state this phenomenon in the language of 
17039  computer hardware, by saying that it is possible 
17047  to design a single ``integrated circuit module'' 
17054  with the following property: If we wire together 
17062  su∃ciently many of these devices in a straight 
17070  line, with each module communicating only with 
17077  its left and right neighbor, the resulting circuitry 
17085  will produce the |ε2n-|πbit product of |εn-|πbit 
17092  numbers in exactly |ε2n |πclock pulses.|'!|9|4|1|1|1Here 
17099  is the basic idea behind this construction: At 
17107  time 0, |εM|β1 |πsenses |ε(u|β0,|4v|β0,|4q|β0) 
17112  |πand it therefore is able to output (|εu|β0v|β0|4α+↓|4q|β0)
17119   |πmod 2 at time 1. Then it sees (|εu|β1,|4v|β1,|4q|β1) 
17129  |πand it can output |ε(u|β0v|β1|4α+↓|4u|β1v|β0|4α+↓|4q|β1|4α
17133  +↓|4k|β1) |πmod 2, where |εk|β1 |πis the ``carry'' 
17141  left over from the previous step, at time 2. 
17150  Next it sees |ε(u|β2,|4v|β2,|4q|β2) |πand outputs 
17156  |ε(u|β0v|β2|4α+↓|4u|β1v|β1|4α+↓|4u|β2v|β0|4α+↓|4q|β2|4α+↓|4k
17156  |β2)|πmod 2; furthermore, its state records the 
17163  values of |εu|β2 |πand |εv|β2 |πso that machine 
17171  |εM|β2 |πwill be able to sense these values at 
17180  time 3, and |εM|β2 |πwill be able to compute 
17189  |εu|β2v|β2 |πfor the bene_t of |εM|β1 |πat time 
17197  4. Thus |εM|β1 |πarranges to start |εM|β2 |πmultiplying 
17205  the sequence |ε(u|β2,|4v|β2), (u|β3,|4v|β3),|4.|4.|4.|4, 
17209  |πand |εM|β2 |πwill ultimately give |εM|β3 |πthe 
17216  job of multiplying (|εu|β4,|4v|β4), (u|β5,|4v|β5), 
17221  |πetc. For{U0}{H9L11M29}|πW58320#Computer Programming!(Knuth
folio 394 galley 17
17223  /Addision-Wesley)!f.394!Ch.4!G.17b.|'{A20}{H10L12M29}!|9|4|1
17224  |1|1Each automaton has 2|g1|g1 states (|εc,|4x|β0,|4y|β0,|4x
17229  |β1,|4y|β1,|4x,|4y,|4z|β2,|4z|β1,|4z|β0), |πwhere 
17231  |ε0|4|¬E|4c|4|¬W|44 |πand each of the |εx'|πs, 
17237  |εy'|πs and |εz'|πs is either 0 or 1. Initially, 
17246  all devices are in state (0,|40,|40,|40,|40,|40,|40,|40,|40,
17251  |40). |πSuppose that a machine |εM|βj, j|4|¬Q|41, 
17258  |πis in state (|εc,|4x|β0,|4y|β0,|4x|β1,|4y|β1,|4x,|4y,|4z|β
17261  2,|4z|β1,|4z|β0) |πat time |εt, |πand its left 
17268  neighbor |εM|βj|βα_↓|β1 |πis in state>{A9}|h|εy|β0|4|∂α=↓|4y
17273  |gl!!|πif!!|εc|gl|4|∂α=↓|45,!!y|β3|4|∂|πotherwise;|E|n|;
17274  |π|L|Lif| |εc|gl|4|Lα=↓|43,| 0|L|πotherwise;>
17275  {A4}|ε| x|ur|↔0|)0|)|4|Lα=↓|4x|gl|π|Lif|ε| c|4|Lα=↓|40,| x|β
17275  0|L|πotherwise;>{A4}|ε| y|ur|↔0|)0|)|4|Lα=↓|4y|gl|L|πif| |εc
17276  |4|Lα=↓|40,| y|β0|L|πotherwise;>{A4}|ε| x|ur|↔0|)1|)|4|Lα=↓|
17277  4x|gl|L|πif| |εc|4|Lα=↓|41,| x|β1|L|πotherwise;|J!(43)>
17278  {A4}|ε| y|ur|↔0|)1|)|4|Lα=↓|4y|gl|L|πif| |εc|4|Lα=↓|41,| y|β
17278  1|L|πotherwise;>{A4}|ε| x|¬S|4|Lα=↓|4x|gl|L|πif| |εc|4|¬R|42
17279  ,| x|L|πotherwise;>{A4}|ε| y|¬S|4|Lα=↓|4y|gl|L|πif| |εc|4|L|
17280  ¬R|42,| |εy|π|Lotherwise;>{A9}|πand (|εz|ur|↔0|)2|)z|ur|↔0|)
17282  1|)z|ur|↔0|)0|))|β2 |πis the binary notation 
17287  for|'{A9}|ε|h|εz|β0|4α+↓|4z|β1|4α+↓|4z|β2|4α+↓|4|9|4|∂x|β0y|
17288  gl|4α+↓|4x|β1y|4α+↓|4xy|β1|4α+↓|4x|gly|β0,!!|π|∂if!!|εc|4|∂α
17288  =↓|43;|E|n|;{A24}(44)|E|?| z|urr|)0|)|4α+↓|4z|β1|4α+↓|4z|url
17290  |)2|)|4α+↓|4|E>{B24}|Lx|gly|gl,|L|πif| |εc|4|Lα=↓|40;>
17292  {A4}|Lx|β0y|gl|4α+↓|4x|gly|β0,|L|πif| |εc|4|Lα=↓|41;>
17293  {A4}|Lx|β0y|gl|4α+↓|4x|β1y|β1|4α+↓|4x|gly|β0,|L|πif| |εc|4|L
17293  α=↓|42;>{A4}|Lx|β0y|gl|4α+↓|4x|β1y|4α+↓|4xy|β1|4α+↓|4x|gly|β
17294  0,|L|πif| |εc|4|Lα=↓|43.>{A9}|π{H10L12M29}The 
17296  leftmost machine |εM|β1 |πbehaves in almost the 
17303  same way as the others; it acts exactly as if 
17313  there were a machine to its left in state (3,|40,|40,|40,|40
17322  ,|4u,|4v,|4q,|40,|40) |πwhen it is receiving 
17327  the inputs (|εu,|4v,|4q). |πThe output of the 
17334  array is the |εz|β0 |πcomponent of |εM|β1.|'|π!|9|4|1|1|1Tab
17341  le 1 shows an example of this array acting on 
17351  the inputs |εu|4α=↓|4v|4α=↓|4(.|4.|4.|400010111)|β2, 
17354  q|4α=↓|4(.|4.|4.|400001011)|β2. |πThe output 
17357  sequence appears in the lower right portion of 
17365  the states of |εM|β1: 0,|40,|41,|41,|41,|40,|40,|40,|40,|41,
17369  |40,|4.|4.|4.|4, |πrepresenting the number (.|4.|4.|40100001
17373  1100)|β2 from right to left.|'!|9|4|1|1|1This 
17379  construction is based on a similar one _rst published 
17388  by A. J. Atrubin, |εIEEE Transactions |π|≡E|≡C|≡<|≡1|≡4 
17395  (1965), 394<399. |πS. Winograd [|εJACM |≡1|≡4 
17401  (1967), |π793<802] has investigated the minimum 
17407  multiplication time achievable in a logical circuit 
17414  when |εn |πis given and when the inputs are available 
17424  all at once in coded form; see also C. S. Wallace, 
17435  |εIEEE Trans. |π|≡E|≡C|≡<|≡1|≡3 (1964), 14<17.|'
17440  !|9|4|1|1|1R. P. Brent has shown that functions 
17447  such as log |εx, |πexp |εx, |πand arctan |εx 
17456  |πcan be evaluated to |εn |πsigni_cant bits in 
17464  |εO(n(|πlog|4|εn)|g2|4|πlog|4log|4|εn) |πsteps, 
17466  using high-speed multiplication [|εJACM, |πto 
17471  appear].|'{A24}|∨E|∨X|∨E|∨R|∨C|∨I|∨S|∨E|∨S|'{A12}{H9L11M29}|
17473  9|1|≡1|≡.|9|4[|ε|*/|↔P|↔P|\] |πThe idea expressed 
17477  in (2) can be generalized to the decimal system, 
17486  if the radix 2 is replaced by 10. Using this 
17496  generalization, calculate 2718 times 4742 (reducing 
17502  this product of four-digit numbers to three products 
17510  of two-digit numbers, and reducing each of the 
17518  latter to products of one-digit numbers).|'{A3}|9|1|≡2|≡.|9|
17524  4[|εM|*/|↔P|↔P|\] |πProve that, in step Cl of 
17531  Algorithm C, the value of |εR |πeither stays 
17539  the same or increases by one when we set |εR|4|¬L|4|"l{H11}|
17548  ¬H{H9}|v4Q|)|"L. (|πTherefore, as observed in 
17553  that step, we need not calculate a square root.)|'
17562  {A3}|9|1|≡3|≡.|9|4[|εM|*/|↔P|↔L|\] |πProve that 
17565  the sequences |εq|βk, r|βk |πde_ned in Algorithm 
17572  C satisfy the inequality |ε2|gq|rk|gα+↓|g1(2r|βk)|gr|rk|4|¬E
17576  |42|gq|rk|rα_↓|r1|gα+↓|gq|rk, |πwhen |εk|4|¬Q|40.|'
17579  {A3}|π|9|1|≡4|≡.|9|4[|εM|*/|↔P|↔l|\] |π(K. Baker.) 
17582  Show that it is advantageous to evaluate the 
17590  polynomial |εW(x) |πat the points |εx|4α=↓|4|→α_↓r,|4.|4.|4.
17595  |4,|40,|4.|4.|4.|4,|4r |πinstead of at the points 
17601  |εx|4α=↓|40,|41,|4.|4.|4.|4,|42r |πas in Algorithm 
17605  C. The polynomial |εU(x) |πcan be written |εU(x)|4α=↓|4U|βe(
17612  x|g2)|4α+↓|4xU|βo(x|g2), |πand similarly |εV(x) 
17616  |πand |εW(x) |πcan be expanded in this way; show 
17625  how to exploit this idea, obtaining faster calculations 
17633  in steps C7 and C8.|'{A3}{H9L11M29}|9|1|≡5|≡.|9|4[|εHM|*/|↔L|
17638  ↔C|\] |πShow that if in step C1 Algorithm C we 
17648  set |εR|4|¬L|4|"p{H11}|¬H{H9}|v42Q|)|"P|4α+↓|41 
17650  |πinstead of |εR|4|¬L|4|"l{H11}|¬H{H9}|v4Q|)|"L, 
17653  |πwith suitable initial values of |εq|β0,|4q|β1,|4r|β0, 
17659  |πand |εr|β1, |πthen (19) can be improved to 
17667  |εt|βk|4|¬E|4q|βk|βα+↓|β12|ur|¬H2|4|πlog|β2|4|εq|βk|βα+↓|β1|
17667  )|)(|πlog|β2|4|εq|βk|βα+↓|β1).|'{A3}|π|9|1|≡6|≡.|9|4[|εM|*/|↔
17668  P|↔L|\] |πProve that the six numbers in (22) 
17676  are relatively prime in pairs.|'{A3}|9|1|≡7|≡.|9|4[|εM|*/|↔P|
17681  ↔L|\] |πProve (23).|'{A3}|9|1|≡8|≡.|9|4[|εM|*/|↔P|↔p|\] 
17685  |πWhy does the fast Fourier multiplication algorithm 
17692  bother to work mod(2|ε|gN|4α+↓|41) |πinstead 
17697  of mod(2|ε|gN|4α_↓|41)? |πIt would seem to be 
17704  much simpler to do everything mod(2|ε|gN|4α_↓|41), 
17710  |πavoiding a lot of miscellaneous minus signs 
17717  in the formulas, since |ε|≤v|4α=↓|42 |πcan be 
17724  used to compute fast Fourier transforms mod(2|ε|g2|in|4α_↓|4
17730  1). |πWhat would go wrong?|'{A3}|≡1|≡0|≡.|9|4[|εM|*/|↔P|↔O|\]
17735   |πWhere is condition (33) used?|'{A3}|≡1|≡1|≡.|9|4[|εM|*/|↔P
17741  |↔o|\] |πIf |εn |πis _xed, how many of the automata 
17751  in the linear iterative array (43), (44) are 
17759  needed to compute the product of |εn-|πbit numbers? 
17767  (Note that the automaton |εM|βj |πis only in⊗uenced 
17775  by the component |εz|urr|)0|) |πof the machine 
17782  on its right, so we may remove all automata whose 
17792  |εz|β0 |πcomponent is always zero whenever the 
17799  inputs are |εn-|πbit numbers.)|'{A3}|≡1|≡2|≡.|9|4[|εM|*/|↔C|↔
17803  c|\] |πImprove on the lower bound (41); is it 
17812  impossible for a general node-structure automation 
17818  (as described in Section 2.6) to multoply |εn-|πbit 
17826  numbers in |εO(n) |πcycles?|'{A3}|≡1|≡3|≡.|9|4[|εM|*/|↔P|↔C|\
17830  ] |π(A. Sch|=4onhage.) What is a good upper bound 
17839  on the time needed to multiply an |εm-|πbit number 
17848  by an |εn-|πbit number, when both |εm |πand |εn 
17857  |πare very large but |εn |πis much larger than 
17866  |εm, |πbased on the results proved in this section 
17875  for |εm|4α=↓|4n?|'{A3}|π|≡1|≡4|≡.|9|4[|εM|*/|↔M|↔P|\] 
17878  |πWrite a program for Algorithm C, incorporating 
17885  the improvements of exercise 4. Compare it with 
17893  a program for Algorithm 4.3.1M and with a program 
17902  based on (2), to see how large |εn |πmust be 
17912  before Algorithm C is an improvement.|'{A9}|9|1|≡9|≡.|9|4[|ε
17918  M|*/|↔P|↔c|\] |πWhat is |ε|=7u|βr (|πthe result 
17924  of two successive Fourier transforms {H11}({H9}32){H11}){H9}
17929  ?|'{A24}{H10L12M29}|∨4|∨.|∨4|∨. |∨R|∨A|∨D|∨I|∨X 
17932  |∨C|∨O|∨N|∨V|∨E|∨R|∨S|∨I|∨O|∨N|'{A12}If men had 
17936  invented arithmetic by counting with their two 
17943  _sts or their eight _ngers, instead of their 
17951  ten ``digits,'' we woworry 
17959  about writing binary-decimal conversion routines. 
17964  (And we would perhaps never have learned as much 
17973  about number systems.) In this section, we shall 
17981  discuss the conversion of{U0}{H9L11M29}|πW58320#Computer 
folio 395 galley 18
17985  Programming!(Knuth/Addision-Wesley)!f.395!Ch.4!G.18b.|'
17986  {A20}{H8L10M29}|∨T|∨a|∨b|∨l|∨e |∨1|;{A3}{H9L11M29}|πMULTIPLI
17988  CATION IN A LINEAR ITERATIVE ARRAY|;{A15}{H9L11M13.6}|∂!!!!!
17994  !!|9|∂!!|9|∂!!!!!!!|9|∂|E|'|π|>Module|4|1|εM|β2|;
17997  |;|πModule|4|1|εM|β3|;>{A11}|∂!|9|∂!|9|∂!|9|∂!|9|∂!|9|∂!!|9|
18000  ∂!|9|∂!|9|∂!|9|∂!|9|∂!|9|∂|E|'{H9L5.5M13.6}|ε|>
18002  |;|;|;|;z|β2|;|;|;|;|;|;z|β2|;>|>|;x|β0|;x|β1|;
18018  x|;|;|;|;x|β0|;x|β1|;x|;>|>c|;|;|;|;z|β1|;|;c|;
18034  |;|;|;z|β1|;>|>|;y|β0|;y|β1|;y|;|;|;|;y|β0|;y|β1|;
18049  y|;>|>|;|;|;|;z|β0|;|;|;|;|;|;z|β0|;>{A9}|>|;
18066  |;|;|;0|;|;|;|;|;|;0|;>|>|;0|;0|;0|;|;|;|;0|;
18086  0|;0|;>|>0|;|;|;|;0|;|;0|;|;|;|;0|;>|>|;0|;0|;
18106  0|;|;|;|;0|;0|;0|;>|>|;|;|;|;0|;|;|;|;|;|;0|;
18126  >>{A2}|>|;|;|;|;0|;|;|;|;|;|;0|;>|>|;0|;0|;0|;
18146  |;|;|;0|;0|;0|;>|>0|;|;|;|;0|;|;0|;|;|;|;0|;>
18166  |>|;0|;0|;0|;|;|;|;0|;0|;0|;>|>|;|;|;|;0|;|;|;
18186  |;|;|;0|;>>{A2}|>|;|;|;|;0|;|;|;|;|;|;0|;>|>|;
18207  0|;0|;0|;|;|;|;0|;0|;0|;>|>0|;|;|;|;0|;|;0|;|;
18226  |;|;0|;>|>|;0|;0|;0|;|;|;|;0|;0|;0|;>|>|;|;|;
18246  |;0|;|;|;|;|;|;0|;>>{A2}|>|;|;|;|;0|;|;|;|;|;
18266  |;0|;>|>|;0|;0|;0|;|;|;|;0|;0|;0|;>|>0|;|;|;|;
18286  0|;|;0|;|;|;|;0|;>|>|;0|;0|;0|;|;|;|;0|;0|;0|;
18305  >|>|;|;|;|;1|;|;|;|;|;|;0|;>>{A2}|>|;|;|;|;0|;
18326  |;|;|;|;|;0|;>|>|;1|;0|;0|;|;|;|;0|;0|;0|;>|>
18346  1|;|;|;|;0|;|;0|;|;|;|;0|;>|>|;1|;0|;0|;|;|;|;
18366  0|;0|;0|;>|>|;|;|;|;1|;|;|;|;|;|;0|;>>{A2}|>|;
18386  |;|;|;0|;|;|;|;|;|;0|;>|>|;1|;0|;0|;|;|;|;0|;
18406  0|;0|;>|>2|;|;|;|;0|;|;0|;|;|;|;0|;>|>|;1|;0|;
18426  0|;|;|;|;0|;0|;0|;>|>|;|;|;|;1|;|;|;|;|;|;0|;
18446  >>{A2}|>|;|;|;|;0|;|;|;|;|;|;0|;>|>|;1|;0|;1|;
18466  |;|;|;0|;0|;0|;>|>3|;|;|;|;1|;|;0|;|;|;|;0|;>
18486  |>|;1|;0|;1|;|;|;|;0|;0|;0|;>|>|;|;|;|;0|;|;|;
18506  |;|;|;0|;>>{A2}|>|;|;|;|;0|;|;|;|;|;|;0|;>|>|;
18527  1|;0|;0|;|;|;|;1|;0|;0|;>|>3|;|;|;|;1|;|;1|;|;
18546  |;|;0|;>|>|;1|;0|;0|;|;|;|;1|;0|;0|;>|>|;|;|;
18566  |;0|;|;|;|;|;|;1|;>>{A2}|>|;|;|;|;0|;|;|;|;|;
18586  |;0|;>|>|;1|;0|;0|;|;|;|;1|;0|;0|;>|>3|;|;|;|;
18606  1|;|;2|;|;|;|;0|;>|>|;1|;0|;0|;|;|;|;1|;0|;0|;
18625  >|>|;|;|;|;0|;|;|;|;|;|;0|;>>{A2}|>|;|;|;|;0|;
18646  |;|;|;|;|;0|;>|>|;1|;0|;0|;|;|;|;1|;0|;0|;>|>
18666  3|;|;|;|;0|;|;3|;|;|;|;0|;>|>|;1|;0|;0|;|;|;|;
18686  1|;0|;0|;>|>|;|;|;|;1|;|;|;|;|;|;0|;>>{A2}|>|;
18706  |;|;|;0|;|;|;|;|;|;0|;>|>|;1|;0|;0|;|;|;|;1|;
18726  0|;0|;>|>3|;|;|;|;0|;|;3|;|;|;|;0|;>|>|;1|;0|;
18746  0|;|;|;|;1|;0|;0|;>|>|;|;|;|;0|;|;|;|;|;|;0|;
18766  >>{A2}|>|;|;|;|;0|;|;|;|;|;|;0|;>|>|;1|;0|;0|;
18786  |;|;|;1|;0|;0|;>|>3|;|;|;|;0|;|;3|;|;|;|;0|;>
18806  |>|;1|;0|;0|;|;|;|;1|;0|;0|;>|>|;|;|;|;0|;|;|;
18826  |;|;|;0|;>>{A12}{H9L11M14}|∂!!!|9|1|1|1|∂!!|∂!!!|∂!!|9|∂!!!!
18832  !!!|9|∂|E|'|π|>Time|;|;Input|;|;Module|4|1|εM|β1|;
18839  >{A11}|∂!!!|9|1|1|1|∂!!|∂!|9|∂!|9|∂!!|9|∂!|9|∂!|9|∂!|9|∂!|9|
18840  ∂!|9|∂|E|'{H9L5.5M14}|ε|>|;|;|;|;|;|;|;|;|;z|β2|;
18852  >|>|;|;|;|;|;|;x|β0|;x|β1|;x|;>|>|;|;|;|;|;c|;
18871  |;|;|;z|β1|;>|>|;|;v|βj|;|;|;|;y|β0|;y|β1|;y|;
18886  >|>|;|;|;|;|;|;|;|;|;z|β0|;>>{A10}|>|;|;|;|;|;
18906  |;|;|;|;0|;>|>|;|;1|;|;|;|;0|;0|;0|;>|>0|;|;1|;
18927  |;0|;|;|;|;0|;>|>|;|;1|;|;|;|;0|;0|;0|;>|>|;|;
18948  |;|;|;|;|;|;|;0|;>>{A2}|>|;|;|;|;|;|;|;|;|;0|;
18969  >|>|;|;1|;|;|;|;1|;0|;0|;>|>1|;|;|;1|;|;1|;|;
18989  |;|;1|;>|>|;|;1|;|;|;|;1|;0|;0|;>|>|;|;|;|;|;
19010  |;|;|;|;0|;>>{A2}|>|;|;|;|;|;|;|;|;|;1|;>|>|;
19031  |;1|;|;|;|;1|;1|;0|;>|>2|;|;|;0|;|;2|;|;|;|;0|;
19051  >|>|;|;1|;|;|;|;1|;1|;0|;>|>|;|;|;|;|;|;|;|;|;
19073  0|;>>{A2}|>|;|;|;|;|;|;|;|;|;0|;>|>|;|;0|;|;|;
19094  |;1|;1|;1|;>|>3|;|;|;1|;|;3|;|;|;|;1|;>|>|;|;
19114  0|;|;|;|;1|;1|;1|;>|>|;|;|;|;|;|;|;|;|;1|;>>{A2}|>
19136  |;|;|;|;|;|;|;|;|;1|;>|>|;|;1|;|;|;|;1|;1|;0|;
19157  >|>4|;|;|;0|;|;3|;|;|;|;0|;>|>|;|;1|;|;|;|;1|;
19178  1|;0|;>|>|;|;|;|;|;|;|;|;|;1|;>>{A2}|>|;|;|;|;
19199  |;|;|;|;|;0|;>|>|;|;0|;|;|;|;1|;1|;1|;>|>5|;|;
19220  |;0|;|;3|;|;|;|;1|;>|>|;|;0|;|;|;|;1|;1|;1|;>
19240  |>|;|;|;|;|;|;|;|;|;1|;>>{A2}|>|;|;|;|;|;|;|;
19261  |;|;1|;>|>|;|;0|;|;|;|;1|;1|;0|;>|>6|;|;|;0|;
19281  |;3|;|;|;|;0|;>|>|;|;|;0|;|;3|;|;|;|;0|;>|>|;
19302  |;0|;|;|;|;1|;1|;0|;>|>|;|;|;|;|;|;|;|;|;0|;>
19323  >{A2}|>|;|;|;|;|;|;|;|;|;0|;>|>|;|;0|;|;|;|;1|;
19344  1|;0|;>|>7|;|;|;0|;|;3|;|;|;|;0|;>|>|;|;|;0|;
19364  |;32|;|;|;|;0|;>|>|;|;|;|;|;|;|;|;|;0|;>>{A2}|>
19385  |;|;|;|;|;|;|;|;|;0|;>|>|;|;0|;|;|;|;1|;1|;0|;
19406  >|>8|;|;|;0|;|;3|;|;|;|;0|;>|>|;|;0|;|;|;|;1|;
19427  1|;0|;>|>|;|;|;|;|;|;|;|;|;0|;>>{A2}|>|;|;|;|;
19448  |;|;|;|;|;0|;>|>|;|;0|;|;|;|;1|;1|;0|;>|>9|;|;
19469  |;0|;|;3|;|;|;|;0|;>|>|;|;0|;|;|;|;1|;1|;0|;>
19489  |>|;|;|;|;|;|;|;|;|;0|;>>{A2}|>|;|;|;|;|;|;|;
19510  |;|;0|;>|>|;|;0|;|;|;|;1|;1|;0|;>|>10|;|;|;0|;
19530  |;3|;|;|;|;0|;>|>|;|;0|;|;|;|;1|;1|;0|;>|>|;|;
19551  |;|;|;|;|;|;|;1|;>>{A2}|>|;|;|;|;|;|;|;|;|;0|;
19572  >|>|;|;0|;|;|;|;1|;1|;0|;>|>11|;|;|;0|;|;3|;|;
19592  |;|;0|;>|>|;|;0|;|;|;|;1|;1|;0|;>|>|;|;|;|;|;
19613  |;|;|;|;0|;>|H *?*?*?*?*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!*!
19620